Lilja Øvrelid

Also published as: Lilja Ovrelid


2020

pdf bib
Classification of Syncope Cases in Norwegian Medical Records
Ildiko Pilan | Pål H. Brekke | Fredrik A. Dahl | Tore Gundersen | Haldor Husby | Øystein Nytrø | Lilja Øvrelid
Proceedings of the 3rd Clinical Natural Language Processing Workshop

Loss of consciousness, so-called syncope, is a commonly occurring symptom associated with worse prognosis for a number of heart-related diseases. We present a comparison of methods for a diagnosis classification task in Norwegian clinical notes, targeting syncope, i.e. fainting cases. We find that an often neglected baseline with keyword matching constitutes a rather strong basis, but more advanced methods do offer some improvement in classification performance, especially a convolutional neural network model. The developed pipeline is planned to be used for quantifying unregistered syncope cases in Norway.

pdf bib
A Tale of Three Parsers: Towards Diagnostic Evaluation for Meaning Representation Parsing
Maja Buljan | Joakim Nivre | Stephan Oepen | Lilja Øvrelid
Proceedings of the 12th Language Resources and Evaluation Conference

We discuss methodological choices in contrastive and diagnostic evaluation in meaning representation parsing, i.e. mapping from natural language utterances to graph-based encodings of its semantic structure. Drawing inspiration from earlier work in syntactic dependency parsing, we transfer and refine several quantitative diagnosis techniques for use in the context of the 2019 shared task on Meaning Representation Parsing (MRP). As in parsing proper, moving evaluation from simple rooted trees to general graphs brings along its own range of challenges. Specifically, we seek to begin to shed light on relative strenghts and weaknesses in different broad families of parsing techniques. In addition to these theoretical reflections, we conduct a pilot experiment on a selection of top-performing MRP systems and one of the five meaning representation frameworks in the shared task. Empirical results suggest that the proposed methodology can be meaningfully applied to parsing into graph-structured target representations, uncovering hitherto unknown properties of the different systems that can inform future development and cross-fertilization across approaches.

pdf bib
NorNE: Annotating Named Entities for Norwegian
Fredrik Jørgensen | Tobias Aasmoe | Anne-Stine Ruud Husevåg | Lilja Øvrelid | Erik Velldal
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents NorNE, a manually annotated corpus of named entities which extends the annotation of the existing Norwegian Dependency Treebank. Comprising both of the official standards of written Norwegian (Bokmål and Nynorsk), the corpus contains around 600,000 tokens and annotates a rich set of entity types including persons, organizations, locations, geo-political entities, products, and events, in addition to a class corresponding to nominals derived from names. We here present details on the annotation effort, guidelines, inter-annotator agreement and an experimental analysis of the corpus using a neural sequence labeling architecture.

pdf bib
A Fine-grained Sentiment Dataset for Norwegian
Lilja Øvrelid | Petter Mæhlum | Jeremy Barnes | Erik Velldal
Proceedings of the 12th Language Resources and Evaluation Conference

We here introduce NoReC_fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed description of this annotation effort. We provide an overview of the developed annotation guidelines, illustrated with examples and present an analysis of inter-annotator agreement. We also report the first experimental results on the dataset, intended as a preliminary benchmark for further experiments.

pdf bib
Building a Norwegian Lexical Resource for Medical Entity Recognition
Ildiko Pilan | Pål H. Brekke | Lilja Øvrelid
Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020)

We present a large Norwegian lexical resource of categorized medical terms. The resource, which merges information from large medical databases, contains over 56,000 entries, including automatically mapped terms from a Norwegian medical dictionary. We describe the methodology behind this automatic dictionary entry mapping based on keywords and suffixes and further present the results of a manual evaluation performed on a subset by a domain expert. The evaluation indicated that ca. 80% of the mappings were correct.

2019

pdf bib
Reinforcement-based denoising of distantly supervised NER with partial annotation
Farhad Nooralahzadeh | Jan Tore Lønning | Lilja Øvrelid
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Existing named entity recognition (NER) systems rely on large amounts of human-labeled data for supervision. However, obtaining large-scale annotated data is challenging particularly in specific domains like health-care, e-commerce and so on. Given the availability of domain specific knowledge resources, (e.g., ontologies, dictionaries), distant supervision is a solution to generate automatically labeled training data to reduce human effort. The outcome of distant supervision for NER, however, is often noisy. False positive and false negative instances are the main issues that reduce performance on this kind of auto-generated data. In this paper, we explore distant supervision in a supervised setup. We adopt a technique of partial annotation to address false negative cases and implement a reinforcement learning strategy with a neural network policy to identify false positive instances. Our results establish a new state-of-the-art on four benchmark datasets taken from different domains and different languages. We then go on to show that our model reduces the amount of manually annotated data required to perform NER in a new domain.

pdf bib
Probing Multilingual Sentence Representations With X-Probe
Vinit Ravishankar | Lilja Øvrelid | Erik Velldal
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions: first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by mapping sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.

pdf bib
Regression or classification? Automated Essay Scoring for Norwegian
Stig Johan Berggren | Taraka Rama | Lilja Øvrelid
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we present first results for the task of Automated Essay Scoring for Norwegian learner language. We analyze a number of properties of this task experimentally and assess (i) the formulation of the task as either regression or classification, (ii) the use of various non-neural and neural machine learning architectures with various types of input representations, and (iii) applying multi-task learning for joint prediction of essay scoring and native language identification. We find that a GRU-based attention model trained in a single-task setting performs best at the AES task.

pdf bib
One-to-X Analogical Reasoning on Word Embeddings: a Case for Diachronic Armed Conflict Prediction from News Texts
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists. The task is cast as a relation discovery problem and applied to historical armed conflicts datasets, attempting to predict new relations of type ‘location:armed-group’ based on data about past events. As the source of semantic information, we use diachronic word embedding models trained on English news texts. A simple technique to improve diachronic performance in such task is demonstrated, using a threshold based on a function of cosine distance to decrease the number of false positives; this approach is shown to be beneficial on two different corpora. Finally, we publish a ready-to-use test set for one-to-X analogy evaluation on historical armed conflicts data.

pdf bib
Sentiment Analysis Is Not Solved! Assessing and Probing Sentiment Classification
Jeremy Barnes | Lilja Øvrelid | Erik Velldal
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Neural methods for sentiment analysis have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English and to provide a challenging dataset. We collect the subset of sentences that an (oracle) ensemble of state-of-the-art sentiment classifiers misclassify and then annotate them for 18 linguistic and paralinguistic phenomena, such as negation, sarcasm, modality, etc. Finally, we provide a case study that demonstrates the usefulness of the dataset to probe the performance of a given sentiment classifier with respect to linguistic phenomena.

pdf bib
Annotating evaluative sentences for sentiment analysis: a dataset for Norwegian
Petter Mæhlum | Jeremy Barnes | Lilja Øvrelid | Erik Velldal
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper documents the creation of a large-scale dataset of evaluative sentences – i.e. both subjective and objective sentences that are found to be sentiment-bearing – based on mixed-domain professional reviews from various news-sources. We present both the annotation scheme and first results for classification experiments. The effort represents a step toward creating a Norwegian dataset for fine-grained sentiment analysis.

pdf bib
Lexicon information in neural sentiment analysis: a multi-task learning approach
Jeremy Barnes | Samia Touileb | Lilja Øvrelid | Erik Velldal
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper explores the use of multi-task learning (MTL) for incorporating external knowledge in neural models. Specifically, we show how MTL can enable a BiLSTM sentiment classifier to incorporate information from sentiment lexicons. Our MTL set-up is shown to improve model performance (compared to a single-task set-up) on both English and Norwegian sentence-level sentiment datasets. The paper also introduces a new sentiment lexicon for Norwegian.

pdf bib
Multilingual Probing of Deep Pre-Trained Contextual Encoders
Vinit Ravishankar | Memduh Gökırmak | Lilja Øvrelid | Erik Velldal
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

Encoders that generate representations based on context have, in recent years, benefited from adaptations that allow for pre-training on large text corpora. Earlier work on evaluating fixed-length sentence representations has included the use of ‘probing’ tasks, that use diagnostic classifiers to attempt to quantify the extent to which these encoders capture specific linguistic phenomena. The principle of probing has also resulted in extended evaluations that include relatively newer word-level pre-trained encoders. We build on probing tasks established in the literature and comprehensively evaluate and analyse – from a typological perspective amongst others – multilingual variants of existing encoders on probing datasets constructed for 6 non-English languages. Specifically, we probe each layer of a multiple monolingual RNN-based ELMo models, the transformer-based BERT’s cased and uncased multilingual variants, and a variant of BERT that uses a cross-lingual modelling scheme (XLM).

2018

pdf bib
SIRIUS-LTG-UiO at SemEval-2018 Task 7: Convolutional Neural Networks with Shortest Dependency Paths for Semantic Relation Extraction and Classification in Scientific Papers
Farhad Nooralahzadeh | Lilja Øvrelid | Jan Tore Lønning
Proceedings of The 12th International Workshop on Semantic Evaluation

This article presents the SIRIUS-LTG-UiO system for the SemEval 2018 Task 7 on Semantic Relation Extraction and Classification in Scientific Papers. First we extract the shortest dependency path (sdp) between two entities, then we introduce a convolutional neural network (CNN) which takes the shortest dependency path embeddings as input and performs relation classification with differing objectives for each subtask of the shared task. This approach achieved overall F1 scores of 76.7 and 83.2 for relation classification on clean and noisy data, respectively. Furthermore, for combined relation extraction and classification on clean data, it obtained F1 scores of 37.4 and 33.6 for each phase. Our system ranks 3rd in all three sub-tasks of the shared task.

pdf bib
Evaluation of Domain-specific Word Embeddings using Knowledge Resources
Farhad Nooralahzadeh | Lilja Øvrelid | Jan Tore Lønning
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
NoReC: The Norwegian Review Corpus
Erik Velldal | Lilja Øvrelid | Eivind Alexander Bergem | Cathrine Stadsnes | Samia Touileb | Fredrik Jørgensen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
The LIA Treebank of Spoken Norwegian Dialects
Lilja Øvrelid | Andre Kåsen | Kristin Hagen | Anders Nøklestad | Per Erik Solberg | Janne Bondi Johannessen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
The 2018 Shared Task on Extrinsic Parser Evaluation: On the Downstream Utility of English Universal Dependency Parsers
Murhaf Fares | Stephan Oepen | Lilja Øvrelid | Jari Björne | Richard Johansson
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We summarize empirical results and tentative conclusions from the Second Extrinsic Parser Evaluation Initiative (EPE 2018). We review the basic task setup, downstream applications involved, and end-to-end results for seventeen participating teams. Based on in-depth quantitative and qualitative analysis, we correlate intrinsic evaluation results at different layers of morph-syntactic analysis with observed downstream behavior.

pdf bib
Diachronic word embeddings and semantic shifts: a survey
Andrey Kutuzov | Lilja Øvrelid | Terrence Szymanski | Erik Velldal
Proceedings of the 27th International Conference on Computational Linguistics

Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

pdf bib
Syntactic Dependency Representations in Neural Relation Classification
Farhad Nooralahzadeh | Lilja Øvrelid
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP

We investigate the use of different syntactic dependency representations in a neural relation classification task and compare the CoNLL, Stanford Basic and Universal Dependencies schemes. We further compare with a syntax-agnostic approach and perform an error analysis in order to gain a better understanding of the results.

pdf bib
SIRIUS-LTG: An Entity Linking Approach to Fact Extraction and Verification
Farhad Nooralahzadeh | Lilja Øvrelid
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

This article presents the SIRIUS-LTG system for the Fact Extraction and VERification (FEVER) Shared Task. It consists of three components: 1) Wikipedia Page Retrieval: First we extract the entities in the claim, then we find potential Wikipedia URI candidates for each of the entities using a SPARQL query over DBpedia 2) Sentence selection: We investigate various techniques i.e. Smooth Inverse Frequency (SIF), Word Mover’s Distance (WMD), Soft-Cosine Similarity, Cosine similarity with unigram Term Frequency Inverse Document Frequency (TF-IDF) to rank sentences by their similarity to the claim. 3) Textual Entailment: We compare three models for the task of claim classification. We apply a Decomposable Attention (DA) model (Parikh et al., 2016), a Decomposed Graph Entailment (DGE) model (Khot et al., 2018) and a Gradient-Boosted Decision Trees (TalosTree) model (Sean et al., 2017) for this task. The experiments show that the pipeline with simple Cosine Similarity using TFIDF in sentence selection along with DA model as labelling model achieves the best results on the development set (F1 evidence: 32.17, label accuracy: 59.61 and FEVER score: 0.3778). Furthermore, it obtains 30.19, 48.87 and 36.55 in terms of F1 evidence, label accuracy and FEVER score, respectively, on the test set. Our system ranks 15th among 23 participants in the shared task prior to any human-evaluation of the evidence.

pdf bib
Iterative development of family history annotation guidelines using a synthetic corpus of clinical text
Taraka Rama | Pål Brekke | Øystein Nytrø | Lilja Øvrelid
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

In this article, we describe the development of annotation guidelines for family history information in Norwegian clinical text. We make use of incrementally developed synthetic clinical text describing patients’ family history relating to cases of cardiac disease and present a general methodology which integrates the synthetically produced clinical statements and guideline development. We analyze inter-annotator agreement based on the developed guidelines and present results from experiments aimed at evaluating the validity and applicability of the annotated corpus using machine learning techniques. The resulting annotated corpus contains 477 sentences and 6030 tokens. Both the annotation guidelines and the annotated corpus are made freely available and as such constitutes the first publicly available resource of Norwegian clinical text.

pdf bib
Expletives in Universal Dependency Treebanks
Gosse Bouma | Jan Hajic | Dag Haug | Joakim Nivre | Per Erik Solberg | Lilja Øvrelid
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

Although treebanks annotated according to the guidelines of Universal Dependencies (UD) now exist for many languages, the goal of annotating the same phenomena in a cross-linguistically consistent fashion is not always met. In this paper, we investigate one phenomenon where we believe such consistency is lacking, namely expletive elements. Such elements occupy a position that is structurally associated with a core argument (or sometimes an oblique dependent), yet are non-referential and semantically void. Many UD treebanks identify at least some elements as expletive, but the range of phenomena differs between treebanks, even for closely related languages, and sometimes even for different treebanks for the same language. In this paper, we present criteria for identifying expletives that are applicable across languages and compatible with the goals of UD, give an overview of expletives as found in current UD treebanks, and present recommendations for the annotation of expletives so that more consistent annotation can be achieved in future releases.

2017

pdf bib
Joint UD Parsing of Norwegian Bokmål and Nynorsk
Erik Velldal | Lilja Øvrelid | Petter Hohle
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Optimizing a PoS Tagset for Norwegian Dependency Parsing
Petter Hohle | Lilja Øvrelid | Erik Velldal
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Wordnet extension via word embeddings: Experiments on the Norwegian Wordnet
Heidi Sand | Erik Velldal | Lilja Øvrelid
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
An open-source tool for negation detection: a maximum-margin approach
Martine Enger | Erik Velldal | Lilja Øvrelid
Proceedings of the Workshop Computational Semantics Beyond Events and Roles

This paper presents an open-source toolkit for negation detection. It identifies negation cues and their corresponding scope in either raw or parsed text using maximum-margin classification. The system design draws on best practice from the existing literature on negation detection, aiming for a simple and portable system that still achieves competitive performance. Pre-trained models and experimental results are provided for English.

pdf bib
Tracing armed conflicts with diachronic word embedding models
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the Events and Stories in the News Workshop

Recent studies have shown that word embedding models can be used to trace time-related (diachronic) semantic shifts in particular words. In this paper, we evaluate some of these approaches on the new task of predicting the dynamics of global armed conflicts on a year-to-year basis, using a dataset from the conflict research field as the gold standard and the Gigaword news corpus as the training data. The results show that much work still remains in extracting ‘cultural’ semantic shifts from diachronic word embedding models. At the same time, we present a new task complete with an evaluation set and introduce the ‘anchor words’ method which outperforms previous approaches on this set.

pdf bib
Downstream use of syntactic analysis: does representation matter?
Lilja Øvrelid
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

pdf bib
Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words. The set-up is similar to the well-known analogies task, but expanded with a time dimension. To this end, we apply incremental updating of the models with new training texts, including incremental vocabulary expansion, coupled with learned transformation matrices that let us map between members of the relation. The proposed approach is evaluated on the task of predicting insurgent armed groups based on geographical locations. The gold standard data for the time span 1994–2010 is extracted from the UCDP Armed Conflicts dataset. The results show that the method is feasible and outperforms the baselines, but also that important work still remains to be done.

2016

pdf bib
Universal Dependencies for Norwegian
Lilja Øvrelid | Petter Hohle
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This article describes the conversion of the Norwegian Dependency Treebank to the Universal Dependencies scheme. This paper details the mapping of PoS tags, morphological features and dependency relations and provides a description of the structural changes made to NDT analyses in order to make it compliant with the UD guidelines. We further present PoS tagging and dependency parsing experiments which report first results for the processing of the converted treebank. The full converted treebank was made available with the 1.2 release of the UD treebanks.

pdf bib
Redefining part-of-speech classes with distributional semantic models
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

pdf bib
OPT: Oslo–Potsdam–Teesside. Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing
Stephan Oepen | Jonathon Read | Tatjana Scheffler | Uladzimir Sidarenka | Manfred Stede | Erik Velldal | Lilja Øvrelid
Proceedings of the CoNLL-16 shared task

pdf bib
Threat detection in online discussions
Aksel Wester | Lilja Øvrelid | Erik Velldal | Hugo Lewi Hammer
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2015

pdf bib
Improving cross-domain dependency parsing with dependency-derived clusters
Jostein Lien | Erik Velldal | Lilja Øvrelid
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

2014

pdf bib
Sentiment classification of online political discussions: a comparison of a word-based and dependency-based method
Hugo Lewi Hammer | Per Erik Solberg | Lilja Øvrelid
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
The Norwegian Dependency Treebank
Per Erik Solberg | Arne Skjærholt | Lilja Øvrelid | Kristin Hagen | Janne Bondi Johannessen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Norwegian Dependency Treebank is a new syntactic treebank for Norwegian Bokmäl and Nynorsk with manual syntactic and morphological annotation, developed at the National Library of Norway in collaboration with the University of Oslo. It is the first publically available treebank for Norwegian. This paper presents the core principles behind the syntactic annotation and how these principles were employed in certain specific cases. We then present the selection of texts and distribution between genres, as well as the annotation process and an evaluation of the inter-annotator agreement. Finally, we present the first results of data-driven dependency parsing of Norwegian, contrasting four state-of-the-art dependency parsers trained on the treebank. The consistency and the parsability of this treebank is shown to be comparable to other large treebank initiatives.

2013

pdf bib
On Different Approaches to Syntactic Analysis Into Bi-Lexical Dependencies. An Empirical Comparison of Direct, PCFG-Based, and HPSG-Based Parsers
Angelina Ivanova | Stephan Oepen | Rebecca Dridan | Dan Flickinger | Lilja Øvrelid
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

pdf bib
Survey on parsing three dependency representations for English
Angelina Ivanova | Stephan Oepen | Lilja Øvrelid
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

2012

pdf bib
Speculation and Negation: Rules, Rankers, and the Role of Syntax
Erik Velldal | Lilja Øvrelid | Jonathon Read | Stephan Oepen
Computational Linguistics, Volume 38, Issue 2 - June 2012

pdf bib
UiO1: Constituent-Based Discriminative Ranking for Negation Resolution
Jonathon Read | Erik Velldal | Lilja Øvrelid | Stephan Oepen
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
UiO 2: Sequence-labeling Negation Using Dependency Features
Emanuele Lapponi | Erik Velldal | Lilja Øvrelid | Jonathon Read
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
The WeSearch Corpus, Treebank, and Treecache – A Comprehensive Sample of User-Generated Content
Jonathon Read | Dan Flickinger | Rebecca Dridan | Stephan Oepen | Lilja Øvrelid
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present the WeSearch Data Collection (WDC)―a freely redistributable, partly annotated, comprehensive sample of User-Generated Content. The WDC contains data extracted from a range of genres of varying formality (user forums, product review sites, blogs and Wikipedia) and covers two different domains (NLP and Linux). In this article, we describe the data selection and extraction process, with a focus on the extraction of linguistic content from different sources. We present the format of syntacto-semantic annotations found in this resource and present initial parsing results for these data, as well as some reflections following a first round of treebanking.

pdf bib
Who Did What to Whom? A Contrastive Study of Syntacto-Semantic Dependencies
Angelina Ivanova | Stephan Oepen | Lilja Øvrelid | Dan Flickinger
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib
Lexical Categories for Improved Parsing of Web Data
Lilja Øvrelid | Arne Skjærholt
Proceedings of COLING 2012: Posters

2010

pdf bib
Resolving Speculation: MaxEnt Cue Classification and Dependency-Based Scope Rules
Erik Velldal | Lilja Øvrelid | Stephan Oepen
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf bib
Syntactic Scope Resolution in Uncertainty Analysis
Lilja Øvrelid | Erik Velldal | Stephan Oepen
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Informed ways of improving data-driven dependency parsing for German
Wolfgang Seeker | Bernd Bohnet | Lilja Øvrelid | Jonas Kuhn
Coling 2010: Posters

pdf bib
Towards a Large Parallel Corpus of Cleft Constructions
Gerlof Bouma | Lilja Øvrelid | Jonas Kuhn
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present our efforts to create a large-scale, semi-automatically annotated parallel corpus of cleft constructions. The corpus is intended to reduce or make more effective the manual task of finding examples of clefts in a corpus. The corpus is being developed in the context of the Collaborative Research Centre SFB 632, which is a large, interdisciplinary research initiative to study information structure, at the University of Potsdam and the Humboldt University in Berlin. The corpus is based on the Europarl corpus (version 3). We show how state-of-the-art NLP tools, like POS taggers and statistical dependency parsers, may facilitate powerful and precise searches. We argue that identifying clefts using automatically added syntactic structure annotation is ultimately to be preferred over using lower level, though more robust, extraction methods like regular expression matching. An evaluation of the extraction method for one of the languages also offers some support for this method. We end the paper by discussing the resulting corpus itself. We present some examples of interesting clefts and translational counterparts from the corpus and suggest ways of exploiting our newly created resource in the cross-linguistic study of clefts.

pdf bib
Training Parsers on Partial Trees: A Cross-language Comparison
Kathrin Spreyer | Lilja Øvrelid | Jonas Kuhn
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a study that compares data-driven dependency parsers obtained by means of annotation projection between language pairs of varying structural similarity. We show how the partial dependency trees projected from English to Dutch, Italian and German can be exploited to train parsers for the target languages. We evaluate the parsers against manual gold standard annotations and find that the projected parsers substantially outperform our heuristic baselines by 9―25% UAS, which corresponds to a 21―43% reduction in error rate. A comparative error analysis focuses on how the projected target language parsers handle subjects, which is especially interesting for Italian as an instance of a pro-drop language. For Dutch, we further present experiments with German as an alternative source language. In both source languages, we contrast standard baseline parsers with parsers that are enhanced with the predictions from large-scale LFG grammars through a technique of parser stacking, and show that improvements of the source language parser can directly lead to similar improvements of the projected target language parser.

2009

pdf bib
Cross-lingual porting of distributional semantic classification
Lilja Øvrelid
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

pdf bib
Improving data-driven dependency parsing using large-scale LFG grammars
Lilja Øvrelid | Jonas Kuhn | Kathrin Spreyer
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
Empirical Evaluations of Animacy Annotation
Lilja Øvrelid
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
Linguistic features in data-driven dependency parsing
Lilja Ovrelid
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

2006

pdf bib
Towards Robust Animacy Classification Using Morphosyntactic Distributional Features
Lilja Øvrelid
Student Research Workshop