Text Zoning and Classification for Job Advertisements in German, French and English

Ann-Sophie Gnehm, Simon Clematide


Abstract
We present experiments to structure job ads into text zones and classify them into pro- fessions, industries and management functions, thereby facilitating social science analyses on labor marked demand. Our main contribution are empirical findings on the benefits of contextualized embeddings and the potential of multi-task models for this purpose. With contextualized in-domain embeddings in BiLSTM-CRF models, we reach an accuracy of 91% for token-level text zoning and outperform previous approaches. A multi-tasking BERT model performs well for our classification tasks. We further compare transfer approaches for our multilingual data.
Anthology ID:
2020.nlpcss-1.10
Volume:
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | NLP+CSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
83–93
Language:
URL:
https://www.aclweb.org/anthology/2020.nlpcss-1.10
DOI:
10.18653/v1/2020.nlpcss-1.10
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.nlpcss-1.10.pdf
Optional supplementary material:
 2020.nlpcss-1.10.OptionalSupplementaryMaterial.zip