New Developments in the Polish Parliamentary Corpus

Maciej Ogrodniczuk, Bartłomiej Nitoń


Abstract
This short paper presents the current (as of February 2020) state of preparation of the Polish Parliamentary Corpus (PPC)—an extensive collection of transcripts of Polish parliamentary proceedings dating from 1919 to present. The most evident developments as compared to the 2018 version is harmonization of metadata, standardization of document identifiers, uploading contents of all documents and metadata to the database (to enable easier modification, maintenance and future development of the corpus), linking utterances to the political ontology, linking corpus texts to source data and processing historical documents.
Anthology ID:
2020.parlaclarin-1.1
Volume:
Proceedings of the Second ParlaCLARIN Workshop
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LREC | ParlaCLARIN | WS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1–4
Language:
English
URL:
https://www.aclweb.org/anthology/2020.parlaclarin-1.1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.parlaclarin-1.1.pdf