Douglas Biber


2019

pdf bib
Toward Multilingual Identification of Online Registers
Veronika Laippala | Roosa Kyllönen | Jesse Egbert | Douglas Biber | Sampo Pyysalo
Proceedings of the 22nd Nordic Conference on Computational Linguistics

We consider cross- and multilingual text classification approaches to the identification of online registers (genres), i.e. text varieties with specific situational characteristics. Register is the most important predictor of linguistic variation, and register information could improve the potential of online data for many applications. We introduce the first manually annotated non-English corpus of online registers featuring the full range of linguistic variation found online. The data set consists of 2,237 Finnish documents and follows the register taxonomy developed for the Corpus of Online Registers of English (CORE). Using CORE and the newly introduced corpus, we demonstrate the feasibility of cross-lingual register identification using a simple approach based on convolutional neural networks and multilingual word embeddings. We further find that register identification results can be improved through multilingual training even when a substantial number of annotations is available in the target language.

1999

pdf bib
Book Reviews: Exploring Textual Data
Douglas Biber
Computational Linguistics, Volume 25, Number 1, March 1999

1993

pdf bib
Using Register-Diversified Corpora for General Language Studies
Douglas Biber
Computational Linguistics, Volume 19, Number 2, June 1993, Special Issue on Using Large Corpora: II

pdf bib
Co-occurrence Patterns among Collocations: A Tool for Corpus-Based Lexical Knowledge Acquisition
Douglas Biber
Computational Linguistics, Volume 19, Number 3, September 1993

1992

pdf bib
Book Reviews: English Computer Corpora: Selected Papers and Research Guide
Douglas Biber
Computational Linguistics, Volume 18, Number 4, December 1992