Adam Jatowt


2020

pdf bib
Dataset for Temporal Analysis of English-French Cognates
Esteban Frossard | Mickael Coustaty | Antoine Doucet | Adam Jatowt | Simon Hengchen
Proceedings of the 12th Language Resources and Evaluation Conference

Languages change over time and, thanks to the abundance of digital corpora, their evolutionary analysis using computational techniques has recently gained much research attention. In this paper, we focus on creating a dataset to support investigating the similarity in evolution between different languages. We look in particular into the similarities and differences between the use of corresponding words across time in English and French, two languages from different linguistic families yet with shared syntax and close contact. For this we select a set of cognates in both languages and study their frequency changes and correlations over time. We propose a new dataset for computational approaches of synchronized diachronic investigation of language pairs, and subsequently show novel findings stemming from the cognate-focused diachronic comparison of the two chosen languages. To the best of our knowledge, the present study is the first in the literature to use computational approaches and large data to make a cross-language diachronic analysis.

pdf bib
Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing
Sora Lim | Adam Jatowt | Michael Färber | Masatoshi Yoshikawa
Proceedings of the 12th Language Resources and Evaluation Conference

The spread of biased news and its consumption by the readers has become a considerable issue. Researchers from multiple domains including social science and media studies have made efforts to mitigate this media bias issue. Specifically, various techniques ranging from natural language processing to machine learning have been used to help determine news bias automatically. However, due to the lack of publicly available datasets in this field, especially ones containing labels concerning bias on a fine-grained level (e.g., on sentence level), it is still challenging to develop methods for effectively identifying bias embedded in new articles. In this paper, we propose a novel news bias dataset which facilitates the development and evaluation of approaches for detecting subtle bias in news articles and for understanding the characteristics of biased sentences. Our dataset consists of 966 sentences from 46 English-language news articles covering 4 different events and contains labels concerning bias on the sentence level. For scalability reasons, the labels were obtained based on crowd-sourcing. Our dataset can be used for analyzing news bias, as well as for developing and evaluating methods for news bias detection. It can also serve as resource for related researches including ones focusing on fake news detection.

pdf bib
Multilingual Epidemiological Text Classification: A Comparative Study
Stephen Mutuvi | Emanuela Boros | Antoine Doucet | Adam Jatowt | Gaël Lejeune | Moses Odeo
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we approach the multilingual text classification task in the context of the epidemiological field. Multilingual text classification models tend to perform differently across different languages (low- or high-resourced), more particularly when the dataset is highly imbalanced, which is the case for epidemiological datasets. We conduct a comparative study of different machine and deep learning text classification models using a dataset comprising news articles related to epidemic outbreaks from six languages, four low-resourced and two high-resourced, in order to analyze the influence of the nature of the language, the structure of the document, and the size of the data. Our findings indicate that the performance of the models based on fine-tuned language models exceeds by more than 50% the chosen baseline models that include a specialized epidemiological news surveillance system and several machine learning models. Also, low-resource languages are highly influenced not only by the typology of the languages on which the models have been pre-trained or/and fine-tuned but also by their size. Furthermore, we discover that the beginning and the end of documents provide the most salient features for this task and, as expected, the performance of the models was proportionate to the training data size.

2019

pdf bib
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
Nina Tahmasebi | Lars Borin | Adam Jatowt | Yang Xu
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

pdf bib
Spatio-Temporal Prediction of Dialectal Variant Usage
Péter Jeszenszky | Panote Siriaraya | Philipp Stoeckle | Adam Jatowt
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

The distribution of most dialectal variants have not only spatial but also temporal patterns. Based on the ‘apparent time hypothesis’, much of dialect change is happening through younger speakers accepting innovations. Thus, synchronic diversity can be interpreted diachronically. With the assumption of the ‘contact effect’, i.e. contact possibility (contact and isolation) between speaker communities being responsible for language change, and the apparent time hypothesis, we aim to predict the usage of dialectal variants. In this paper we model the contact possibility based on two of the most important factors in sociolinguistics to be affecting language change: age and distance. The first steps of the approach involve modeling contact possibility using a logistic predictor, taking the age of respondents into account. We test the global, and the local role of age for variation where the local level means spatial subsets around each survey site, chosen based on k nearest neighbors. The prediction approach is tested on Swiss German syntactic survey data, featuring multiple respondents from different age cohorts at survey sites. The results show the relative success of the logistic prediction approach and the limitations of the method, therefore further proposals are made to develop the methodology.

2018

pdf bib
A High-Quality Gold Standard for Citation-based Tasks
Michael Färber | Alexander Thiemann | Adam Jatowt
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Multi-Attention based Neural Network with External Knowledge for Story Ending Predicting Task
Qian Li | Ziwei Li | Jin-Mao Wei | Yanhui Gu | Adam Jatowt | Zhenglu Yang
Proceedings of the 27th International Conference on Computational Linguistics

Enabling a mechanism to understand a temporal story and predict its ending is an interesting issue that has attracted considerable attention, as in case of the ROC Story Cloze Task (SCT). In this paper, we develop a multi-attention-based neural network (MANN) with well-designed optimizations, like Highway Network, and concatenated features with embedding representations into the hierarchical neural network model. Considering the particulars of the specific task, we thoughtfully extend MANN with external knowledge resources, exceeding state-of-the-art results obviously. Furthermore, we develop a thorough understanding of our model through a careful hand analysis on a subset of the stories. We identify what traits of MANN contribute to its outperformance and how external knowledge is obtained in such an ending prediction task.

2016

pdf bib
HistoryComparator: Interactive Across-Time Comparison in Document Archives
Adam Jatowt | Marc Bron
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Recent years have witnessed significant increase in the number of large scale digital collections of archival documents such as news articles, books, etc. Typically, users access these collections through searching or browsing. In this paper we investigate another way of accessing temporal collections - across-time comparison, i.e., comparing query-relevant information at different periods in the past. We propose an interactive framework called HistoryComparator for contrastively analyzing concepts in archival document collections at different time periods.

2015

pdf bib
Omnia Mutantur, Nihil Interit: Connecting Past with Present by Finding Corresponding Terms across Time
Yating Zhang | Adam Jatowt | Sourav Bhowmick | Katsumi Tanaka
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)