From Linguistic Descriptions to Language Profiles

Shafqat Mumtaz Virk, Harald Hammarström, Lars Borin, Markus Forsberg, Søren Wichmann


Abstract
Language catalogues and typological databases are two important types of resources containing different types of knowledge about the world’s natural languages. The former provide metadata such as number of speakers, location (in prose descriptions and/or GPS coordinates), language code, literacy, etc., while the latter contain information about a set of structural and functional attributes of languages. Given that both types of resources are developed and later maintained manually, there are practical limits as to the number of languages and the number of features that can be surveyed. We introduce the concept of a language profile, which is intended to be a structured representation of various types of knowledge about a natural language extracted semi-automatically from descriptive documents and stored at a central location. It has three major parts: (1) an introductory; (2) an attributive; and (3) a reference part, each containing different types of knowledge about a given natural language. As a case study, we develop and present a language profile of an example language. At this stage, a language profile is an independent entity, but in the future it is envisioned to become part of a network of language profiles connected to each other via various types of relations. Such a representation is expected to be suitable both for humans and machines to read and process for further deeper linguistic analyses and/or comparisons.
Anthology ID:
2020.ldl-1.4
Volume:
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LDL | LREC | WS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
23–27
Language:
English
URL:
https://www.aclweb.org/anthology/2020.ldl-1.4
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.ldl-1.4.pdf