Eckhard Bick

Also published as: E. Bick


2020

pdf bib
Syntax and Semantics in a Treebank for Esperanto
Eckhard Bick
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper we describe and evaluate syntactic and semantic aspects of Arbobanko, a treebank for the artificial language Esperanto, as well as tools and methods used in the production of the treebank. In addition to classical morphosyntax and dependency structure, the treebank was enriched with a lexical-semantic layer covering named entities, a semantic type ontology for nouns and adjectives and a framenet-inspired semantic classification of verbs. For an under-resourced language, the quality of automatic syntactic and semantic pre-annotation is of obvious importance, and by evaluating the underlying parser and the coverage of its semantic ontologies, we try to answer the question whether the language’s extremely regular morphology and transparent semantic affixes translate into a more regular syntax and higher parsing accuracy. On the linguistic side, the treebank allows us to address and quantify typological issues such as the question of word order, auxiliary constructions, lexical transparency and semantic type ambiguity in Esperanto.

pdf bib
An Annotated Social Media Corpus for German
Eckhard Bick
Proceedings of the 12th Language Resources and Evaluation Conference

This paper presents the German Twitter section of a large (2 billion word) bilingual Social Media corpus for Hate Speech research, discussing the compilation, pseudonymization and grammatical annotation of the corpus, as well as special linguistic features and peculiarities encountered in the data. Among other things, compounding, accidental and intentional orthographic variation, gendering and the use of emoticons/emojis are addressed in a genre-specific fashion. We present the different layers of linguistic annotation (morphosyntactic, dependencies and semantic types) and explain how a general parser (GerGram) can be made to work on Social Media data, pointing out necessary adaptations and extensions. In an evaluation run on a random cross-section of tweets, the modified parser achieved F-scores of 97% for morphology (fine-grained POS) and 92% for syntax (labeled attachment score). Predictably, performance was twice as good in tweets with standard orthography than in tweets with spelling/casing irregularities or lack of sentence separation, the effect being more marked for morphology than for syntax.

2019

pdf bib
A Semantic Ontology of Danish Adjectives
Eckhard Bick
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

This paper presents a semantic annotation scheme for Danish adjectives, focusing both on prototypical semantic content and semantic collocational restrictions on an adjective’s head noun. The core type set comprises about 110 categories ordered in a shallow hierarchy with 14 primary and 25 secondary umbrella categories. In addition, domain information and binary sentiment tags are provided, as well as VerbNet-derived frames and semantic roles for those adjectives governing arguments. The scheme has been almost fully implemented on the lexicon of the Danish VISL parser, DanGram, containing 14,000 adjectives. We discuss the annotation scheme and its applicational perspectives, and present a statistical breakdown and coverage evaluation for three Danish reference corpora.

pdf bib
Automatic Generation and Semantic Grading of Esperanto Sentences in a Teaching Context
Eckhard Bick
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning

2017

pdf bib
From Treebank to Propbank: A Semantic-Role and VerbNet Corpus for Danish
Eckhard Bick
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Universal Dependencies for Portuguese
Alexandre Rademaker | Fabricio Chalub | Livy Real | Cláudia Freitas | Eckhard Bick | Valeria de Paiva
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
Propbank Annotation of Danish Noun Frames
Eckhard Bick
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

2016

pdf bib
A Morphological Lexicon of Esperanto with Morpheme Frequencies
Eckhard Bick
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper discusses the internal structure of complex Esperanto words (CWs). Using a morphological analyzer, possible affixation and compounding is checked for over 50,000 Esperanto lexemes against a list of 17,000 root words. Morpheme boundaries in the resulting analyses were then checked manually, creating a CW dictionary of 28,000 words, representing 56.4% of the lexicon, or 19.4% of corpus tokens. The error percentage of the EspGram morphological analyzer for new corpus CWs was 4.3% for types and 6.4% for tokens, with a recall of almost 100%, and wrong/spurious boundaries being more common than missing ones. For pedagogical purposes a morpheme frequency dictionary was constructed for a 16 million word corpus, confirming the importance of agglutinative derivational morphemes in the Esperanto lexicon. Finally, as a means to reduce the morphological ambiguity of CWs, we provide POS likelihoods for Esperanto suffixes.

pdf bib
Constraint Grammar-based conversion of Dependency Treebanks
Eckhard Bick
Proceedings of the 13th International Conference on Natural Language Processing

2015

pdf bib
DanProof: Pedagogical Spell and Grammar Checking for Danish
Eckhard Bick
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
CG-3 — Beyond Classical Constraint Grammar
Eckhard Bick | Tino Didriksen
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

pdf bib
WikiTrans: Swedish-Danish Machine Translation in a Constraint Grammar Framework
Eckhard Bick
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

2014

pdf bib
ML-Optimization of Ported Constraint Grammars
Eckhard Bick
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we describe how a Constraint Grammar with linguist-written rules can be optimized and ported to another language using a Machine Learning technique. The effects of rule movements, sorting, grammar-sectioning and systematic rule modifications are discussed and quantitatively evaluated. Statistical information is used to provide a baseline and to enhance the core of manual rules. The best-performing parameter combinations achieved part-of-speech F-scores of over 92 for a grammar ported from English to Danish, a considerable advance over both the statistical baseline (85.7), and the raw ported grammar (86.1). When the same technique was applied to an existing native Danish CG, error reduction was 10% (F=96.94).

2013

pdf bib
Using Constraint Grammar for Chunking
Eckhard Bick
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf bib
ML-Tuned Constraint Grammars
Eckhard Bick
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)

2012

pdf bib
The annotation of the C-ORAL-BRASIL oral through the implementation of the Palavras Parser
Eckhard Bick | Heliana Mello | Alessandro Panunzi | Tommaso Raso
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This article describes the morphosyntactic annotation of the C-ORAL-BRASIL speech corpus, using an adapted version of the Palavras parser. In order to achieve compatibility with annotation rules designed for standard written Portuguese, transcribed words were orthographically normalized, and the parsing lexicon augmented with speech-specific material, phonetically spelled abbreviations etc. Using a two-level annotation approach, speech flow markers like overlaps, retractions and non-verbal productions were separated from running, annotatable text. In the absence of punctuation, syntactic segmentation was achieved by exploiting prosodic break markers, enhanced by a rule-based distinctions between pause and break functions. Under optimal conditions, the modified parsing system achieved correctness rates (F-scores) of 98.6% for part of speech, 95% for syntactic function and 99% for lemmatization. Especially at the syntactic level, a clear connection between accessibility of prosodic break markers and annotation performance could be documented.

pdf bib
Towards a Semantic Annotation of English Television News - Building and Evaluating a Constraint Grammar FrameNet
Eckhard Bick
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

pdf bib
Tailored Feature Extraction for Lexical Disambiguation of English Verbs Based on Corpus Pattern Analysis
Martin Holub | Vincent Kríž | Silvie Cinková | Eckhard Bick
Proceedings of COLING 2012

2011

pdf bib
A FrameNet for Danish
Eckhard Bick
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf bib
A Bare-bones Constraint Grammar
Eckhard Bick
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2010

pdf bib
Degrees of Orality in Speech-like Corpora: Comparative Annotation of Chat and E-mail Corpora
Eckhard Bick
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
FrAG, a Hybrid Constraint Grammar Parser for French
Eckhard Bick
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes a hybrid system (FrAG) for tagging / parsing French text, and presents results from ongoing development work, corpus annotation and evaluation. The core of the system is a sentence scope Constraint Grammar (CG), with linguist-written rules. However, unlike traditional CG, the system uses hybrid techniques on both its morphological input side and its syntactic output side. Thus, FrAG draws on a pre-existing probabilistic Decision Tree Tagger (DTT) before and in parallel with its own lexical stage, and feeds its output into a Phrase Structure Grammar (PSG) that uses CG syntactic function tags rather than ordinary terminals in its rewriting rules. As an alternative architecture, dependency tree structures are also supported. In the newest version, dependencies are assigned within the CG-framework itself, and can interact with other rules. To provide semantic context, a semantic prototype ontology for nouns is used, covering a large part of the lexicon. In a recent test run on Parliamentary debate transcripts, FrAG achieved F-scores of 98.7 % for part of speech (PoS) and between 93.1 % and 96.2 % for syntactic function tags. Dependency links were correct in 95.9 %.

2009

pdf bib
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)
Kristiina Jokinen | Eckhard Bick
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

pdf bib
Automatic Semantic Role Annotation for Spanish
Eckhard Bick | M. Pilar Valverde Ibáñez
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

pdf bib
DeepDict–A Graphical Corpus-based Dictionary of Word Relations
Eckhard Bick
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

2007

pdf bib
Using Danish as a CG Interlingua: A Wide-Coverage Norwegian-English Machine Translation System
Eckhard Bick | Lars Nygaard
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

pdf bib
Hybrid Ways to Improve Domain Independence in an ML Dependency Parser
Eckhard Bick
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Turning a Dependency Treebank into a PSG-style Constituent Treebank
Eckhard Bick
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we present and evaluate a new method to convert Constraint Grammar (CG) parses of running text into Constituent Treebanks. The conversion is two-step - first a grammar-based method is used to bridge the gap between raw CG annotation and full dependency structure, then phrase structure bracketing and non-terminal nodes are introduced by clustering sister dependents, effectively building one syntactic treebank on top of another. The method is compared with another approach (Bick 2003-2), where constituent structures are arrived at by employing a function-tag based Phrase Structure Grammar (PSG). Results are evaluated on a small reference corpus for both raw and revised CG input, with bracketing F-Scores of 87.5% for raw text and 97.1% for revised CG input, and a raw text edge label accuracy of 95.9% for forms and 86% for functions, or 99.7% and 99.4%, respectively, for revised CG. By applying the tools to the CG-only part of the Danish Arboretum treebank we were able to increase the size of the treebank by 86%, from 197.400 to 367.500 words.

pdf bib
Semantic tagging for resolution of indirect anaphora
R. Vieira | E. Bick | J. Coelho | V. Muller | S. Collovini | J. Souza | L. Rino
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

pdf bib
LingPars, a Linguistically Inspired, Language-Independent Machine Learner for Dependency Treebanks
Eckhard Bick
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

2004

pdf bib
A Named Entity Recognizer for Danish
Eckhard Bick
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
Floresta Sintá(c)tica: A treebank for Portuguese
Susana Afonso | Eckhard Bick | Renato Haber | Diana Santos
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
The VISL System: Research and applicative aspects of IT-based learning
Eckhard Bick
Proceedings of the 13th Nordic Conference of Computational Linguistics (NODALIDA 2001)

2000

pdf bib
Providing Internet Access to Portuguese Corpora: the AC/DC Project
Diana Santos | Eckhard Bick
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf bib
Structural Lexical Heuristics in the Automatic Analysis of Portuguese
Eckhard Bick
Proceedings of the 11th Nordic Conference of Computational Linguistics (NODALIDA 1998)