Adam Ek


2020

pdf bib
How does Punctuation Affect Neural Models in Natural Language Inference
Adam Ek | Jean-Philippe Bernardy | Stergios Chatzikyriakidis
Proceedings of the Probability and Meaning Conference (PaM 2020)

Natural Language Inference models have reached almost human-level performance but their generalisation capabilities have not been yet fully characterized. In particular, sensitivity to small changes in the data is a current area of investigation. In this paper, we focus on the effect of punctuation on such models. Our findings can be broadly summarized as follows: (1) irrelevant changes in punctuation are correctly ignored by the recent transformer models (BERT) while older RNN-based models were sensitive to them. (2) All models, both transformers and RNN-based models, are incapable of taking into account small relevant changes in the punctuation.

pdf bib
How Much of Enhanced UD Is Contained in UD?
Adam Ek | Jean-Philippe Bernardy
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

In this paper, we present the submission of team CLASP to the IWPT 2020 Shared Task on parsing enhanced universal dependencies. We develop a tree-to-graph transformation algorithm based on dependency patterns. This algorithm can transform gold UD trees to EUD graphs with an ELAS score of 81.55 and a EULAS score of 96.70. These results show that much of the information needed to construct EUD graphs from UD trees are present in the UD trees. Coupled with a standard UD parser, the method applies to the official test data and yields and ELAS score of 67.85 and a EULAS score is 80.18.

pdf bib
Composing Byte-Pair Encodings for Morphological Sequence Classification
Adam Ek | Jean-Philippe Bernardy
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

Byte-pair encodings is a method for splitting a word into sub-word tokens, a language model then assigns contextual representations separately to each of these tokens. In this paper, we evaluate four different methods of composing such sub-word representations into word representations. We evaluate the methods on morphological sequence classification, the task of predicting grammatical features of a word. Our experiments reveal that using an RNN to compute word representations is consistently more effective than the other methods tested across a sample of eight languages with different typology and varying numbers of byte-pair tokens per word.

2019

pdf bib
Synthetic Propaganda Embeddings To Train A Linear Projection
Adam Ek | Mehdi Ghanimifard
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

This paper presents a method of detecting fine-grained categories of propaganda in text. Given a sentence, our method aims to identify a span of words and predict the type of propaganda used. To detect propaganda, we explore a method for extracting features of propaganda from contextualized embeddings without fine-tuning the large parameters of the base model. We show that by generating synthetic embeddings we can train a linear function with ReLU activation to extract useful labeled embeddings from an embedding space generated by a general-purpose language model. We also introduce an inference technique to detect continuous spans in sequences of propaganda tokens in sentences. A result of the ensemble model is submitted to the first shared task in fine-grained propaganda detection at NLP4IF as Team Stalin. In this paper, we provide additional analysis regarding our method of detecting spans of propaganda with synthetically generated representations.

pdf bib
Language Modeling with Syntactic and Semantic Representation for Sentence Acceptability Predictions
Adam Ek | Jean-Philippe Bernardy | Shalom Lappin
Proceedings of the 22nd Nordic Conference on Computational Linguistics

In this paper, we investigate the effect of enhancing lexical embeddings in LSTM language models (LM) with syntactic and semantic representations. We evaluate the language models using perplexity, and we evaluate the performance of the models on the task of predicting human sentence acceptability judgments. We train LSTM language models on sentences automatically annotated with universal syntactic dependency roles (Nivre, 2016), dependency depth and universal semantic tags (Abzianidze et al., 2017) to predict sentence acceptability judgments. Our experiments indicate that syntactic tags lower perplexity, while semantic tags increase it. Our experiments also show that neither syntactic nor semantic tags improve the performance of LSTM language models on the task of predicting sentence acceptability judgments.

2018

pdf bib
Identifying Speakers and Addressees in Dialogues Extracted from Literary Fiction
Adam Ek | Mats Wirén | Robert Östling | Kristina N. Björkenstam | Gintarė Grigonytė | Sofia Gustafson Capková
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Mainstreaming August Strindberg with Text Normalization
Adam Ek | Sofia Knuutinen
Proceedings of the 21st Nordic Conference on Computational Linguistics