Text Zoning and Classification for Job Advertisements in German, French and English
Ann-Sophie Gnehm | Simon Clematide
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
We present experiments to structure job ads into text zones and classify them into pro- fessions, industries and management functions, thereby facilitating social science analyses on labor marked demand. Our main contribution are empirical findings on the benefits of contextualized embeddings and the potential of multi-task models for this purpose. With contextualized in-domain embeddings in BiLSTM-CRF models, we reach an accuracy of 91% for token-level text zoning and outperform previous approaches. A multi-tasking BERT model performs well for our classification tasks. We further compare transfer approaches for our multilingual data.