Claus Povlsen


2016

pdf bib
Providing a Catalogue of Language Resources for Commercial Users
Bente Maegaard | Lina Henriksen | Andrew Joscelyne | Vesna Lusicky | Margaretha Mazura | Sussi Olsen | Claus Povlsen | Philippe Wacker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Language resources (LR) are indispensable for the development of tools for machine translation (MT) or various kinds of computer-assisted translation (CAT). In particular language corpora, both parallel and monolingual are considered most important for instance for MT, not only SMT but also hybrid MT. The Language Technology Observatory will provide easy access to information about LRs deemed to be useful for MT and other translation tools through its LR Catalogue. In order to determine what aspects of an LR are useful for MT practitioners, a user study was made, providing a guide to the most relevant metadata and the most relevant quality criteria. We have seen that many resources exist which are useful for MT and similar work, but the majority are for (academic) research or educational use only, and as such not available for commercial use. Our work has revealed a list of gaps: coverage gap, awareness gap, quality gap, quantity gap. The paper ends with recommendations for a forward-looking strategy.

2014

pdf bib
Encompassing a spectrum of LT users in the CLARIN-DK Infrastructure
Lina Henriksen | Dorte Haltrup Hansen | Bente Maegaard | Bolette Sandford Pedersen | Claus Povlsen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

CLARIN-DK is a platform with language resources constituting the Danish part of the European infrastructure CLARIN ERIC. Unlike some other language based infrastructures CLARIN-DK is not solely a repository for upload and storage of data, but also a platform of web services permitting the user to process data in various ways. This involves considerable complications in relation to workflow requirements. The CLARIN-DK interface must guide the user to perform the necessary steps of a workflow; even when the user is inexperienced and perhaps has an unclear conception of the requested results. This paper describes a user driven approach to creating a user interface specification for CLARIN-DK. We indicate how different user profiles determined different crucial interface design options. We also describe some use cases established in order to give illustrative examples of how the platform may facilitate research.

2010

pdf bib
Incorporating Speech Synthesis in the Development of a Mobile Platform for e-learning.
Justus Roux | Pieter Scholtz | Daleen Klop | Claus Povlsen | Bart Jongejan | Asta Magnusdottir
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This presentation and accompanying demonstration focuses on the development of a mobile platform for e-learning purposes with enhanced text-to-speech capabilities. It reports on an international consortium project entitled Mobile E-learning for Africa (MELFA), which includes a reading and literacy training component, particularly focusing on an African language, isiXhosa. The high penetration rate of mobile phones within the African continent has created new opportunities for delivering various kinds of information, including e-learning material to communities that have not had appropriate infrastructures. Aspects of the mobile platform development are described paying attention to basic functionalities of the user interface, as well as to the underlying web technologies involved. Some of the main features of the literacy training module are described, such as grapheme-sound correspondence, syllabification-sound relationships, varying tempo of presentation. A particular point is made for using HMM (HTS) synthesis in this case, as it seems to be very appropriate for less resourced languages.

2008

pdf bib
Merging a Syntactic Resource with a WordNet: a Feasibility Study of a Merge between STO and DanNet
Bolette Sandford Pedersen | Anna Braasch | Lina Henriksen | Sussi Olsen | Claus Povlsen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents a feasibility study of a merge between SprogTeknologisk Ordbase (STO), which contains morphological and syntactic information, and DanNet, which is a Danish WordNet containing semantic information in terms of synonym sets and semantic relations. The aim of the merge is to develop a richer, composite resource which we believe will have a broader usage perspective than the two seen in isolation. In STO, the organizing principle is based on the observable syntactic features of a lemma’s near context (labeled syntactic units or SynUs). In contrast, the basic unit in DanNet is constituted by semantic senses or - in wordnet terminology - synonym sets (synsets). The merge of the two resources is thus basically to be understood as a linking between SynUs and synsets. In the paper we discuss which parts of the merge can be performed semi-automatically and which parts require manual linguistic matching procedures. We estimate that this manual work will amount to approx. 39% of the lexicon material.

pdf bib
Domain specific MT in use
Lene Offersgaard | Claus Povlsen | Lisbeth Almsten | Bente Maegaard
Proceedings of the 12th Annual conference of the European Association for Machine Translation

2006

pdf bib
EuroTermBank - a Terminology Resource based on Best Practice
Lina Henriksen | Claus Povlsen | Andrejs Vasiljevs
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The new EU member countries face the problems of terminology resource fragmentation and lack of coordination in terminology development in general. The EuroTermBank project aims at contributing to improve the terminology infrastructure of the new EU countries and the project will result in a centralized online terminology bank - interlinked to other terminology banks and resources - for languages of the new EU member countries. The main focus of this paper is on a description of how to identify best practice within terminology work seen from a broad perspective. Surveys of real life terminology work have been conducted and these surveys have resulted in identification of scenario specific best practice descriptions of terminology work. Furthermore, this paper will present an outline of the specific criteria that have been used for selection of existing term resources to be included in the EuroTermBank database.

pdf bib
The MULINCO corpus and corpus platform
Bente Maegaard | Lene Offersgaard | Lina Henriksen | Hanne Jansen | Xavier Lepetit | Costanza Navarretta | Claus Povlsen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The MULINCO project (MUltiLINgual Corpus of the University of Copenhagen) started early 2005. The purpose of this cross-disciplinary project is to create a corpus platform for education and research in monolingual and translation studies. The project covers two main types of corpus texts: literary and non-literary. The platform is being developed using available tools as far as possible, and integrating them in a very open architecture. In this paper we describe the current status and future developments of both the text and tool side of the corpus platform, and we show some examples of student exercises taking advantage of tagged and aligned texts.

1994

pdf bib
Natural language processing in dialogue systems with spoken input
Claus Povlsen
Proceedings of the 9th Nordic Conference of Computational Linguistics (NODALIDA 1993)