Text Zoning and Classification for Job Advertisements in German, French and English
Ann-Sophie Gnehm, Simon Clematide
Abstract
We present experiments to structure job ads into text zones and classify them into pro- fessions, industries and management functions, thereby facilitating social science analyses on labor marked demand. Our main contribution are empirical findings on the benefits of contextualized embeddings and the potential of multi-task models for this purpose. With contextualized in-domain embeddings in BiLSTM-CRF models, we reach an accuracy of 91% for token-level text zoning and outperform previous approaches. A multi-tasking BERT model performs well for our classification tasks. We further compare transfer approaches for our multilingual data.- Anthology ID:
- 2020.nlpcss-1.10
- Volume:
- Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venues:
- EMNLP | NLP+CSS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 83–93
- Language:
- URL:
- https://www.aclweb.org/anthology/2020.nlpcss-1.10
- DOI:
- 10.18653/v1/2020.nlpcss-1.10
- PDF:
- http://aclanthology.lst.uni-saarland.de/2020.nlpcss-1.10.pdf