Anders Nøklestad


2020

pdf bib
Comparing Methods for Measuring Dialect Similarity in Norwegian
Janne Johannessen | Andre Kåsen | Kristin Hagen | Anders Nøklestad | Joel Priestley
Proceedings of the 12th Language Resources and Evaluation Conference

The present article presents four experiments with two different methods for measuring dialect similarity in Norwegian: the Levenshtein method and the neural long short term memory (LSTM) autoencoder network, a machine learning algorithm. The visual output in the form of dialect maps is then compared with canonical maps found in the dialect literature. All of this enables us to say that one does not need fine-grained transcriptions of speech to replicate classical classification patterns.

2019

pdf bib
Tagging a Norwegian Dialect Corpus
Andre Kåsen | Anders Nøklestad | Kristin Hagen | Joel Priestley
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper describes an evaluation of five data-driven part-of-speech (PoS) taggers for spoken Norwegian. The taggers all rely on different machine learning mechanisms: decision trees, hidden Markov models (HMMs), conditional random fields (CRFs), long-short term memory networks (LSTMs), and convolutional neural networks (CNNs). We go into some of the challenges posed by the task of tagging spoken, as opposed to written, language, and in particular a wide range of dialects as is found in the recordings of the LIA (Language Infrastructure made Accessible) project. The results show that the taggers based on either conditional random fields or neural networks perform much better than the rest, with the LSTM tagger getting the highest score.

2018

pdf bib
The LIA Treebank of Spoken Norwegian Dialects
Lilja Øvrelid | Andre Kåsen | Kristin Hagen | Anders Nøklestad | Per Erik Solberg | Janne Bondi Johannessen
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
A modernised version of the Glossa corpus search system
Anders Nøklestad | Kristin Hagen | Janne Bondi Johannessen | Michał Kosek | Joel Priestley
Proceedings of the 21st Nordic Conference on Computational Linguistics

2013

pdf bib
Exploring Features for Named Entity Recognition in Lithuanian Text Corpus
Jurgita Kapočiūtė-Dzikienė | Anders Nøklestad | Janne Bondi Johannessen | Algis Krupavičius
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf bib
The Nordic Dialect Corpus
Janne Bondi Johannessen | Joel Priestley | Kristin Hagen | Anders Nøklestad | André Lynum
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, we describe the Nordic Dialect Corpus, which has recently been completed. The corpus has a variety of features that combined makes it an advanced tool for language researchers. These features include: Linguistic contents (dialects from five closely related languages), annotation (tagging and two types of transcription), search interface (advanced possibilities for combining a large array of search criteria and results presentation in an intuitive and simple interface), many search variables (linguistics-based, informant-based, time-based), multimedia display (linking of sound and video to transcriptions), display of results in maps, display of informant details (number of words and other information on informants), advanced results handling (concordances, collocations, counts and statistics shown in a variety of graphical modes, plus further processing). Finally, and importantly, the corpus is freely available for research on the web. We give examples of both various kinds of searches, of displays of results and of results handling.

2010

pdf bib
A Multilingual Speech Resource: The Nordic Dialect Corpus
Janne Bondi Johannessen | Joel Priestley | Anders Nøklestad
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Enhancing Language Resources with Maps
Janne Bondi Johannessen | Kristin Hagen | Anders Nøklestad | Joel Priestley
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We will look at how maps can be integrated in research resources, such as language databases and language corpora. By using maps, search results can be illustrated in a way that immediately gives the user information that words or numbers on their own would not give. We will illustrate with two different resources, into which we have now added a Google Maps application: The Nordic Dialect Corpus (Johannessen et al. 2009) and The Nordic Syntactic Judgments Database (Lindstad et al. 2009). We have integrated Google Maps into these applications. The database contains some hundred syntactic test sentences that have been evaluated by four speakers in more than hundred locations in Norway and Sweden. Searching for the evaluations of a particular sentence gives a list of several hundred judgments, which are difficult for a human researcher to assess. With the map option, isoglosses are immediately visible. We show in the paper that both with the maps depicting corpus hits and with the maps depicting database results, the map visualizations actually show clear geographical differences that would be very difficult to spot just by reading concordance lines or database tables.

2009

pdf bib
The Nordic Dialect Database: Mapping Microsyntactic Variation in the Scandinavian Languages
Arne Martinus Lindstad | Anders Nøklestad | Janne Bondi Johannessen | Øystein Alexander Vangsnes
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)

2008

pdf bib
Glossa: a Multilingual, Multimodal, Configurable User Interface
Lars Nygaard | Joel Priestley | Anders Nøklestad | Janne Bondi Johannessen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe a web-based corpus query system, Glossa, which combines the expressiveness of regular query languages with the user-friendliness of a graphical interface. Since corpus users are usually linguists with little interest in technical matters, we have developed a system where the user need not have any prior knowledge of the search system. Furthermore, no previous knowledge of abbreviations for metavariables such as part of speech and source text is needed. All searches are done using checkboxes, pull-down menus, or writing simple letters to make words or other strings. Querying for more than one word is simply done by adding an additional query box, and for parts of words by choosing a feature such as “start of word”. The Glossa system also allows a wide range of viewing and post-processing options. Collocations can be viewed and counted in a number of ways, and be viewed as different kinds of graphical charts. Further annotation and deletion of single results for further processing is also easy. The Glossa system is already in use for a number of corpora. Corpus administrators can easily adapt the system to a wide range of corpora, including multilingual corpora and corpora with audio and video content.

2007

pdf bib
Tagging a Norwegian Speech Corpus
Anders Nøklestad | Åshild Søfteland
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

2006

pdf bib
Developing a re-usable web-demonstrator for automatic anaphora resolution with support for manual editing of coreference chains
Anders Nøklestad | Øystein Reigem | Christer Johansson
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Automatic markup and editing of anaphora and coreference is performed within one system. The processing is trained using memory based learning, and representations derive from various lexical resources. The current model reaches an expected combined precision and recall of F=62. The further improvement of the coreference detection is work in progress. Editing of coreference is separated into a module working on an xml-file. The editing mechanism can thus be reused in other projects. The editor is designed to store a copy on the server of all files that are edited over the internet using our demonstrator. This might help us to expand our database of texts annotated for anaphora and coreference. Further research includes creating high coverage lexical resources, and modules for other languages. The current system is trained on Norwegian bokm°al, but we hope to extend this to other languages with available tools (e.g. POS-taggers).

pdf bib
Detecting reference chains in Norwegian
Anders Nøklestad | Christer Johansson
Proceedings of the 15th Nordic Conference of Computational Linguistics (NODALIDA 2005)

2004

pdf bib
Memory-based Classification of Proper Names in Norwegian
Anders Nøklestad
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2000

pdf bib
A Web-based Advanced and User Friendly System: The Oslo Corpus of Tagged Norwegian Texts
Janne Bondi Johannessen | Anders Nøklestad | Kristin Hagen
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
The shortcomings of a tagger
Kristin Hagen | Janne Bondi Johannessen | Anders Nøklestad
Proceedings of the 12th Nordic Conference of Computational Linguistics (NODALIDA 1999)