Jón Guðnason


2020

pdf bib
Manual Speech Synthesis Data Acquisition - From Script Design to Recording Speech
Atli Sigurgeirsson | Gunnar Örnólfsson | Jón Guðnason
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Atli Þór Sigurgeirsson, atlithors@ru.is, Reykjavik University Gunnar Thor Örnólfsson, gunnarthor@hi.is, Árni Magnússon institute of Icelandic studies Dr. Jón Guðnason, jg@ru.is In this paper we present the work of collecting a large amount of high quality speech synthesis data for Icelandic. 8 speakers will be recorded for 20 hours each. A script design strategy is proposed and three scripts have been generated to maximize diphone coverage, varying in length. The largest reading script contains 14,400 prompts and includes 87.3% of all Icelandic diphones at least once and 81% of all Icelandic diphones at least twenty times. A recording client was developed to facilitate recording sessions. The client supports easily importing scripts and maintaining multiple collections in parallel. The recorded data can be downloaded straight from the client. Recording sessions are carried out in a professional studio under supervision and started October of 2019. As of writing, 58.7 hours of high quality speech data has been collected. The scripts, the recording software and the speech data will later be released under a CC-BY 4.0 license.

pdf bib
Language Technology Programme for Icelandic 2019-2023
Anna Nikulásdóttir | Jón Guðnason | Anton Karl Ingason | Hrafn Loftsson | Eiríkur Rögnvaldsson | Einar Freyr Sigurðsson | Steinþór Steingrímsson
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we describe a new national language technology programme for Icelandic. The programme, which spans a period of five years, aims at making Icelandic usable in communication and interactions in the digital world, by developing accessible, open-source language resources and software. The research and development work within the programme is carried out by a consortium of universities, institutions, and private companies, with a strong emphasis on cooperation between academia and industries. Five core projects will be the main content of the programme: language resources, speech recognition, speech synthesis, machine translation, and spell and grammar checking. We also describe other national language technology programmes and give an overview over the history of language technology in Iceland.

2018

pdf bib
Open ASR for Icelandic: Resources and a Baseline System
Anna Björk Nikulásdóttir | Inga Rún Helgadóttir | Matthías Pétursson | Jón Guðnason
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Risamálheild: A Very Large Icelandic Text Corpus
Steinþór Steingrímsson | Sigrún Helgadóttir | Eiríkur Rögnvaldsson | Starkaður Barkarson | Jón Guðnason
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Málrómur: A Manually Verified Corpus of Recorded Icelandic Speech
Steinþór Steingrímsson | Jón Guðnason | Sigrún Helgadóttir | Eiríkur Rögnvaldsson
Proceedings of the 21st Nordic Conference on Computational Linguistics