Lars Borin


2020

pdf bib
Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning
David Alfter | Elena Volodina | Ildikó Pilan | Herbert Lange | Lars Borin
Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning

pdf bib
Towards a Swedish Roget-Style Thesaurus for NLP
Niklas Zechner | Lars Borin
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

Bring’s thesaurus (Bring) is a Swedish counterpart of Roget, and its digitized version could make a valuable language resource for use in many and diverse natural language processing (NLP) applications. From the literature we know that Roget-style thesauruses and wordnets have complementary strengths in this context, so both kinds of lexical-semantic resource are good to have. However, Bring was published in 1930, and its lexical items are in the form of lemma–POS pairings. In order to be useful in our NLP systems, polysemous lexical items need to be disambiguated, and a large amount of modern vocabulary must be added in the proper places in Bring. The work presented here describes experiments aiming at automating these two tasks, at least in part, where we use the structure of an existing Swedish semantic lexicon – Saldo – both for disambiguation of ambiguous Bring entries and for addition of new entries to Bring.

pdf bib
Material Philology Meets Digital Onomastic Lexicography: The NordiCon Database of Medieval Nordic Personal Names in Continental Sources
Michelle Waldispühl | Dana Dannells | Lars Borin
Proceedings of the 12th Language Resources and Evaluation Conference

We present NordiCon, a database containing medieval Nordic personal names attested in Continental sources. The database combines formally interpreted and richly interlinked onomastic data with digitized versions of the medieval manuscripts from which the data originate and information on the tokens’ context. The structure of NordiCon is inspired by other online historical given name dictionaries. It takes up challenges reported on in previous works, such as how to cover material properties of a name token and how to define lemmatization principles, and elaborates on possible solutions. The lemmatization principles for NordiCon are further developed in order to facilitate the connection to other name dictionaries and corpuses, and the integration of the database into Språkbanken Text, an infrastructure containing modern and historical written data.

pdf bib
CLARIN: Distributed Language Resources and Technology in a European Infrastructure
Maria Eskevich | Franciska de Jong | Alexander König | Darja Fišer | Dieter Van Uytvanck | Tero Aalto | Lars Borin | Olga Gerassimenko | Jan Hajic | Henk van den Heuvel | Neeme Kahusk | Krista Liin | Martin Matthiesen | Stelios Piperidis | Kadri Vider
Proceedings of the 1st International Workshop on Language Technology Platforms

CLARIN is a European Research Infrastructure providing access to digital language resources and tools from across Europe and beyond to researchers in the humanities and social sciences. This paper focuses on CLARIN as a platform for the sharing of language resources. It zooms in on the service offer for the aggregation of language repositories and the value proposition for a number of communities that benefit from the enhanced visibility of their data and services as a result of integration in CLARIN. The enhanced findability of language resources is serving the social sciences and humanities (SSH) community at large and supports research communities that aim to collaborate based on virtual collections for a specific domain. The paper also addresses the wider landscape of service platforms based on language technologies which has the potential of becoming a powerful set of interoperable facilities to a variety of communities of use.

pdf bib
From Linguistic Descriptions to Language Profiles
Shafqat Mumtaz Virk | Harald Hammarström | Lars Borin | Markus Forsberg | Søren Wichmann
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)

Language catalogues and typological databases are two important types of resources containing different types of knowledge about the world’s natural languages. The former provide metadata such as number of speakers, location (in prose descriptions and/or GPS coordinates), language code, literacy, etc., while the latter contain information about a set of structural and functional attributes of languages. Given that both types of resources are developed and later maintained manually, there are practical limits as to the number of languages and the number of features that can be surveyed. We introduce the concept of a language profile, which is intended to be a structured representation of various types of knowledge about a natural language extracted semi-automatically from descriptive documents and stored at a central location. It has three major parts: (1) an introductory; (2) an attributive; and (3) a reference part, each containing different types of knowledge about a given natural language. As a case study, we develop and present a language profile of an example language. At this stage, a language profile is an independent entity, but in the future it is envisioned to become part of a network of language profiles connected to each other via various types of relations. Such a representation is expected to be suitable both for humans and machines to read and process for further deeper linguistic analyses and/or comparisons.

2019

pdf bib
Towards Assessing Argumentation Annotation - A First Step
Anna Lindahl | Lars Borin | Jacobo Rouces
Proceedings of the 6th Workshop on Argument Mining

This paper presents a first attempt at using Walton’s argumentation schemes for annotating arguments in Swedish political text and assessing the feasibility of using this particular set of schemes with two linguistically trained annotators. The texts are not pre-annotated with argumentation structure beforehand. The results show that the annotators differ both in number of annotated arguments and selection of the conclusion and premises which make up the arguments. They also differ in their labeling of the schemes, but grouping the schemes increases their agreement. The outcome from this will be used to develop guidelines for future annotations.

pdf bib
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
Nina Tahmasebi | Lars Borin | Adam Jatowt | Yang Xu
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

pdf bib
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning
David Alfter | Elena Volodina | Lars Borin | Ildikó Pilan | Herbert Lange
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning

pdf bib
Exploiting Frame-Semantics and Frame-Semantic Parsing for Automatic Extraction of Typological Information from Descriptive Grammars of Natural Languages
Shafqat Mumtaz Virk | Azam Sheikh Muhammad | Lars Borin | Muhammad Irfan Aslam | Saania Iqbal | Nazia Khurram
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

We describe a novel system for automatic extraction of typological linguistic information from descriptive grammars of natural languages, applying the theory of frame semantics in the form of frame-semantic parsing. The current proof-of-concept system covers a few selected linguistic features, but the methodology is general and can be extended not only to other typological features but also to descriptive grammars written in languages other than English. Such a system is expected to be a useful assistance for automatic curation of typological databases which otherwise are built manually, a very labor and time consuming as well as cognitively taxing enterprise.

2018

pdf bib
Generating a Gold Standard for a Swedish Sentiment Lexicon
Jacobo Rouces | Nina Tahmasebi | Lars Borin | Stian Rødven Eide
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
SenSALDO: Creating a Sentiment Lexicon for Swedish
Jacobo Rouces | Nina Tahmasebi | Lars Borin | Stian Rødven Eide
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning
Ildikó Pilán | Elena Volodina | David Alfter | Lars Borin
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning

2017

pdf bib
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition
Elena Volodina | Gintarė Grigonytė | Ildikó Pilán | Kristina Nilsson Björkenstam | Lars Borin
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition

2016

pdf bib
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition
Elena Volodina | Gintarė Grigonytė | Ildikó Pilán | Kristina Nilsson Björkenstam | Lars Borin
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition

2015

pdf bib
Proceedings of the fourth workshop on NLP for computer-assisted language learning
Elena Volodina | Lars Borin | Ildikó Pilán
Proceedings of the fourth workshop on NLP for computer-assisted language learning

pdf bib
Proceedings of the workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015
Bolette Sandford Pedersen | Sussi Olsen | Lars Borin
Proceedings of the workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015

pdf bib
Here be dragons? The perils and promises of inter-resource lexical-semantic mapping
Lars Borin | Luis Nieto Piña | Richard Johansson
Proceedings of the workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015

2014

pdf bib
Swesaurus; or, The Frankenstein Approach to Wordnet Construction
Lars Borin | Markus Forsberg
Proceedings of the Seventh Global Wordnet Conference

pdf bib
Proceedings of the third workshop on NLP for computer-assisted language learning
Elena Volodina | Lars Borin | Ildikó Pilán
Proceedings of the third workshop on NLP for computer-assisted language learning

pdf bib
Linguistic landscaping of South Asia using digital language resources: Genetic vs. areal linguistics
Lars Borin | Anju Saxena | Taraka Rama | Bernard Comrie
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Like many other research fields, linguistics is entering the age of big data. We are now at a point where it is possible to see how new research questions can be formulated - and old research questions addressed from a new angle or established results verified - on the basis of exhaustive collections of data, rather than small, carefully selected samples. For example, South Asia is often mentioned in the literature as a classic example of a linguistic area, but there is no systematic, empirical study substantiating this claim. Examination of genealogical and areal relationships among South Asian languages requires a large-scale quantitative and qualitative comparative study, encompassing more than one language family. Further, such a study cannot be conducted manually, but needs to draw on extensive digitized language resources and state-of-the-art computational tools. We present some preliminary results of our large-scale investigation of the genealogical and areal relationships among the languages of this region, based on the linguistic descriptions available in the 19 tomes of Grierson’s monumental “Linguistic Survey of India” (1903-1927), which is currently being digitized with the aim of turning the linguistic information in the LSI into a digital language resource suitable for a broad array of linguistic investigations.

pdf bib
HFST-SweNER — A New NER Resource for Swedish
Dimitrios Kokkinakis | Jyrki Niemi | Sam Hardwick | Krister Lindén | Lars Borin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Named entity recognition (NER) is a knowledge-intensive information extraction task that is used for recognizing textual mentions of entities that belong to a predefined set of categories, such as locations, organizations and time expressions. NER is a challenging, difficult, yet essential preprocessing technology for many natural language processing applications, and particularly crucial for language understanding. NER has been actively explored in academia and in industry especially during the last years due to the advent of social media data. This paper describes the conversion, modeling and adaptation of a Swedish NER system from a hybrid environment, with integrated functionality from various processing components, to the Helsinki Finite-State Transducer Technology (HFST) platform. This new HFST-based NER (HFST-SweNER) is a full-fledged open source implementation that supports a variety of generic named entity types and consists of multiple, reusable resource layers, e.g., various n-gram-based named entity lists (gazetteers).

pdf bib
Bring vs. MTRoget: Evaluating automatic thesaurus translation
Lars Borin | Jens Allwood | Gerard de Melo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Evaluation of automatic language-independent methods for language technology resource creation is difficult, and confounded by a largely unknown quantity, viz. to what extent typological differences among languages are significant for results achieved for one language or language pair to be applicable across languages generally. In the work presented here, as a simplifying assumption, language-independence is taken as axiomatic within certain specified bounds. We evaluate the automatic translation of Roget’s “Thesaurus” from English into Swedish using an independently compiled Roget-style Swedish thesaurus, S.C. Bring’s “Swedish vocabulary arranged into conceptual classes” (1930). Our expectation is that this explicit evaluation of one of the thesaureses created in the MTRoget project will provide a good estimate of the quality of the other thesauruses created using similar methods.

pdf bib
The Strategic Impact of META-NET on the Regional, National and International Level
Georg Rehm | Hans Uszkoreit | Sophia Ananiadou | Núria Bel | Audronė Bielevičienė | Lars Borin | António Branco | Gerhard Budin | Nicoletta Calzolari | Walter Daelemans | Radovan Garabík | Marko Grobelnik | Carmen García-Mateo | Josef van Genabith | Jan Hajič | Inma Hernáez | John Judge | Svetla Koeva | Simon Krek | Cvetana Krstev | Krister Lindén | Bernardo Magnini | Joseph Mariani | John McNaught | Maite Melero | Monica Monachini | Asunción Moreno | Jan Odijk | Maciej Ogrodniczuk | Piotr Pęzik | Stelios Piperidis | Adam Przepiórkowski | Eiríkur Rögnvaldsson | Michael Rosner | Bolette Pedersen | Inguna Skadiņa | Koenraad De Smedt | Marko Tadić | Paul Thompson | Dan Tufiş | Tamás Váradi | Andrejs Vasiļjevs | Kadri Vider | Jolanta Zabarskaite
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative’s work throughout Europe in order to boost progress and innovation in our field.

pdf bib
A flexible language learning platform based on language resources and web services
Elena Volodina | Ildikó Pilán | Lars Borin | Therese Lindström Tiedemann
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present Lärka, the language learning platform of Spräkbanken (the Swedish Language Bank). It consists of an exercise generator which reuses resources available through Spräkbanken: mainly Korp, the corpus infrastructure, and Karp, the lexical infrastructure. Through Lärka we reach new user groups ― students and teachers of Linguistics as well as second language learners and their teachers ― and this way bring Spräkbanken’s resources in a relevant format to them. Lärka can therefore be viewed as an case of real-life language resource evaluation with end users. In this article we describe Lärka’s architecture, its user interface, and the five exercise types that have been released for users so far. The first user evaluation following in-class usage with students of linguistics, speech therapy and teacher candidates are presented. The outline of future work concludes the paper.

2013

pdf bib
Nordic and Baltic Wordnets Aligned and Compared through “WordTies”
Bolette Sandford Pedersen | Lars Borin | Markus Forsberg | Neeme Kahusk | Krister Lindén | Jyrki Niemi | Niklas Nisbeth | Lars Nygaard | Heili Orav | Eirikur Rögnvaldsson | Mitchell Seaton | Kadri Vider | Kaarlo Voionmaa
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf bib
Baltic and Nordic Parts of the European Linguistic Infrastructure
Inguna Skadiņa | Andrejs Vasiļjevs | Lars Borin | Krister Lindén | Gyri Losnegaard | Sussi Olsen | Bolette Sandford Pedersen | Roberts Rozis | Koenraad De Smedt
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf bib
Korp and Karp – A Bestiary of Language Resources: The Research Infrastructure of Språkbanken
Malin Ahlberg | Lars Borin | Markus Forsberg | Martin Hammarstedt | Leif-Jöran Olsson | Olof Olsson | Johan Roxendal | Jonatan Uppström
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf bib
Korp — the corpus infrastructure of Språkbanken
Lars Borin | Markus Forsberg | Johan Roxendal
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present Korp, the corpus infrastructure of Språkbanken (the Swedish Language Bank). The infrastructure consists of three main components: the Korp corpus pipeline, the Korp backend, and the Korp frontend. The Korp corpus pipeline is used for importing corpora, annotating them, and then exporting the annotated corpora into different formats. An essential feature of the pipeline is the ability to leave existing annotations untouched, both structural and word level annotations, and to use the existing annotations as the foundation of other annotations. The Korp backend consists of a set of REST-based web services for searching in and retrieving information about the corpora. Finally, the Korp frontend is a graphical search interface that interacts with the Korp backend. The interface has been inspired by corpus search interfaces such as SketchEngine, Glossa, and DeepDict, and it uses State Chart XML (SCXML) in order to enable users to bookmark interaction states. We give a functional and technical overview of the three components, followed by a discussion of planned future work.

pdf bib
The open lexical infrastructure of Språkbanken
Lars Borin | Markus Forsberg | Leif-Jöran Olsson | Jonatan Uppström
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present our ongoing work on Karp, Språkbanken's (the Swedish Language Bank) open lexical infrastructure, which has two main functions: (1) to support the work on creating, curating, and integrating our various lexical resources; and (2) to publish daily versions of the resources, making them searchable and downloadable. An important requirement on the lexical infrastructure is also that we maintain a strong bidirectional connection to our corpus infrastructure. At the heart of the infrastructure is the SweFN++ project with the goal to create free Swedish lexical resources geared towards language technology applications. The infrastructure currently hosts 15 Swedish lexical resources, including historical ones, some of which have been created from scratch using existing free resources, both external and in-house. The resources are integrated through links to a pivot lexical resource, SALDO, a large morphological and lexical-semantic resource for modern Swedish. SALDO has been selected as the pivot partly because of its size and quality, but also because its form and sense units have been assigned persistent identifiers (PIDs) to which the lexical information in other lexical resources and in corpora are linked.

pdf bib
Toward Language Independent Methodology for Generating Artwork Descriptions – Exploring FrameNet Information
Dana Dannélls | Lars Borin
Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Transferring Frames: Utilization of Linked Lexical Resources
Lars Borin | Markus Forsberg | Richard Johansson | Kristiina Muhonen | Tanja Purtonen | Kaarlo Voionmaa
Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

pdf bib
Search Result Diversification Methods to Assist Lexicographers
Lars Borin | Markus Forsberg | Karin Friberg Heppin | Richard Johansson | Annika Kjellandsson
Proceedings of the Sixth Linguistic Annotation Workshop

2011

pdf bib
META-NORD: Towards Sharing of Language Resources in Nordic and Baltic Countries
Inguna Skadiņa | Andrejs Vasiļjevs | Lars Borin | Koenraad De Smedt | Krister Lindén | Eiríkur Rögnvaldsson
Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm

pdf bib
Semantic search in literature as an e-Humanities research tool: CONPLISIT – Consumption patterns and life-style in 19th century Swedish literature
Lars Borin | Markus Forsberg | Christer Ahlberger
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf bib
Estimating Language Relationships from a Parallel Corpus. A Study of the Europarl Corpus
Taraka Rama | Lars Borin
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf bib
Dialect classification in the Himalayas: a computational approach
Anju Saxena | Lars Borin
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

pdf bib
Unsupervised Learning of Morphology
Harald Hammarström | Lars Borin
Computational Linguistics, Volume 37, Issue 2 - June 2011

2010

pdf bib
Diabase: Towards a Diachronic BLARK in Support of Historical Studies
Lars Borin | Markus Forsberg | Dimitrios Kokkinakis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present our ongoing work on language technology-based e-science in the humanities, social sciences and education, with a focus on text-based research in the historical sciences. An important aspect of language technology is the research infrastructure known by the acronym BLARK (Basic LAnguage Resource Kit). A BLARK as normally presented in the literature arguably reflects a modern standard language, which is topic- and genre-neutral, thus abstracting away from all kinds of language variation. We argue that this notion could fruitfully be extended along any of the three axes implicit in this characterization (the social, the topical and the temporal), in our case the temporal axis, towards a diachronic BLARK for Swedish, which can be used to develop e-science tools in support of historical studies.

pdf bib
Resource and Service Centres as the Backbone for a Sustainable Service Infrastructure
Peter Wittenburg | Nuria Bel | Lars Borin | Gerhard Budin | Nicoletta Calzolari | Eva Hajicova | Kimmo Koskenniemi | Lothar Lemnitzer | Bente Maegaard | Maciej Piasecki | Jean-Marie Pierrel | Stelios Piperidis | Inguna Skadina | Dan Tufis | Remco van Veenendaal | Tamas Váradi | Martin Wynne
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Currently, research infrastructures are being designed and established in many disciplines since they all suffer from an enormous fragmentation of their resources and tools. In the domain of language resources and tools the CLARIN initiative has been funded since 2008 to overcome many of the integration and interoperability hurdles. CLARIN can build on knowledge and work from many projects that were carried out during the last years and wants to build stable and robust services that can be used by researchers. Here service centres will play an important role that have the potential of being persistent and that adhere to criteria as they have been established by CLARIN. In the last year of the so-called preparatory phase these centres are currently developing four use cases that can demonstrate how the various pillars CLARIN has been working on can be integrated. All four use cases fulfil the criteria of being cross-national.

2009

pdf bib
Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCHSHELT&R 2009)
Lars Borin | Piroska Lendvai
Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH – SHELT&R 2009)

2007

pdf bib
Naming the Past: Named Entity and Animacy Recognition in 19th Century Swedish Literature
Lars Borin | Dimitrios Kokkinakis | Leif-Jöran Olsson
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

2002

pdf bib
Living off the land: The Web as a source of practice texts for learners of less prevalent languages
Kristina Nilsson | Lars Borin
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
Something Borrowed, Something Blue: Rule-based Combination of POS Taggers
Lars Borin
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
You’ll Take the High Road and I’ll Take the Low Road: Using a Third Language to Improve Bilingual Word Alignment
Lars Borin
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
Pivot Alignment
Lars Borin
Proceedings of the 12th Nordic Conference of Computational Linguistics (NODALIDA 1999)

1999

pdf bib
A Corpus-Based Grammar Tutor for Education in Language and Speech Technology
Lars Borin | Mats DahllS
EACL 1999: Computer and Internet Supported Education in Language and Speech Technology

1998

pdf bib
Linguistics isn’t always the answer: Word comparison in computational linguistics
Lars Borin
Proceedings of the 11th Nordic Conference of Computational Linguistics (NODALIDA 1998)

1988

pdf bib
A constraint-based approach to morphological analysis (preliminaries)
Lars Borin
Proceedings of the 6th Nordic Conference of Computational Linguistics (NODALIDA 1987)

1986

pdf bib
What is a lexical representation?
Lars Borin
Proceedings of the 5th Nordic Conference of Computational Linguistics (NODALIDA 1985)

1984

pdf bib
Ett textdatabassystem för lingvister (A text database system for linguists) [In Swedish]
Lars Borin
Proceedings of the 4th Nordic Conference of Computational Linguistics (NODALIDA 1983)

Search
Co-authors