Helge Dyvik


2017

pdf bib
Exploring Treebanks with INESS Search
Victoria Rosén | Helge Dyvik | Paul Meurer | Koenraad De Smedt
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

pdf bib
NorGramBank: A ‘Deep’ Treebank for Norwegian
Helge Dyvik | Paul Meurer | Victoria Rosén | Koenraad De Smedt | Petter Haugereid | Gyri Smørdal Losnegaard | Gunn Inger Lyse | Martha Thunes
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present NorGramBank, a treebank for Norwegian with highly detailed LFG analyses. It is one of many treebanks made available through the INESS treebanking infrastructure. NorGramBank was constructed as a parsebank, i.e. by automatically parsing a corpus, using the wide coverage grammar NorGram. One part consisting of 350,000 words has been manually disambiguated using computer-generated discriminants. A larger part of 50 M words has been stochastically disambiguated. The treebank is dynamic: by global reparsing at certain intervals it is kept compatible with the latest versions of the grammar and the lexicon, which are continually further developed in interaction with the annotators. A powerful query language, INESS Search, has been developed for search across formalisms in the INESS treebanks, including LFG c- and f-structures. Evaluation shows that the grammar provides about 85% of randomly selected sentences with good analyses. Agreement among the annotators responsible for manual disambiguation is satisfactory, but also suggests desirable simplifications of the grammar.

2014

pdf bib
The Interplay Between Lexical and Syntactic Resources in Incremental Parsebanking
Victoria Rosén | Petter Haugereid | Martha Thunes | Gyri S. Losnegaard | Helge Dyvik
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Automatic syntactic analysis of a corpus requires detailed lexical and morphological information that cannot always be harvested from traditional dictionaries. In building the INESS Norwegian treebank, it is often the case that necessary lexical information is missing in the morphology or lexicon. The approach used to build the treebank is incremental parsebanking; a corpus is parsed with an existing grammar, and the analyses are efficiently disambiguated by annotators. When the intended analysis is unavailable after parsing, the reason is often that necessary information is not available in the lexicon. INESS has therefore implemented a text preprocessing interface where annotators can enter unrecognized words before parsing. This may concern words that are unknown to the morphology and/or lexicon, and also words that are known, but for which important information is missing. When this information is added, either during text preprocessing or during disambiguation, the result is that after reparsing the intended analysis can be chosen and stored in the treebank. The lexical information added to the lexicon in this way may be of great interest both to lexicographers and to other language technology efforts, and the enriched lexical resource being developed will be made available at the end of the project.

2013

pdf bib
The INESS Treebanking Infrastructure
Paul Meurer | Helge Dyvik | Victoria Rosén | Koenraad De Smedt | Gunn Inger Lyse | Gyri Smørdal Losnegaard | Martha Thunes
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf bib
ParGramBank: The ParGram Parallel Treebank
Sebastian Sulger | Miriam Butt | Tracy Holloway King | Paul Meurer | Tibor Laczkó | György Rákosi | Cheikh Bamba Dione | Helge Dyvik | Victoria Rosén | Koenraad De Smedt | Agnieszka Patejuk | Özlem Çetinoğlu | I Wayan Arka | Meladel Mistica
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2005

pdf bib
Holistic regression testing for high-quality MT: some methodological and technological reflections
Stephan Oepen | Helge Dyvik | Dan Flickinger | Jan Tore Lønning | Paul Meurer | Victoria Rosén
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2002

pdf bib
The Parallel Grammar Project
Miriam Butt | Helge Dyvik | Tracy Holloway King | Hiroshi Masuichi | Christian Rohrer
COLING-02: Grammar Engineering and Evaluation

1992

pdf bib
Linguistics and Machine Translation
Helge Dyvik
Proceedings of the 8th Nordic Conference of Computational Linguistics (NODALIDA 1991)

1984

pdf bib
Parsing basert på LFG: Et MIT/Xerox-system applisert på norsk (Parsing based on LFG: A MIT/Xerox system applied on Norwegian) [In Norwegian]
Helge Dyvik | Knut Hofland
Proceedings of the 4th Nordic Conference of Computational Linguistics (NODALIDA 1983)