Emmanuele Chersoni


2020

pdf bib
Are Word Embeddings Really a Bad Fit for the Estimation of Thematic Fit?
Emmanuele Chersoni | Ludovica Pannitto | Enrico Santus | Alessandro Lenci | Chu-Ren Huang
Proceedings of the 12th Language Resources and Evaluation Conference

While neural embeddings represent a popular choice for word representation in a wide variety of NLP tasks, their usage for thematic fit modeling has been limited, as they have been reported to lag behind syntax-based count models. In this paper, we propose a complete evaluation of count models and word embeddings on thematic fit estimation, by taking into account a larger number of parameters and verb roles and introducing also dependency-based embeddings in the comparison. Our results show a complex scenario, where a determinant factor for the performance seems to be the availability to the model of reliable syntactic information for building the distributional representations of the roles.

pdf bib
Ciron: a New Benchmark Dataset for Chinese Irony Detection
Rong Xiang | Xuefeng Gao | Yunfei Long | Anran Li | Emmanuele Chersoni | Qin Lu | Chu-Ren Huang
Proceedings of the 12th Language Resources and Evaluation Conference

Automatic Chinese irony detection is a challenging task, and it has a strong impact on linguistic research. However, Chinese irony detection often lacks labeled benchmark datasets. In this paper, we introduce Ciron, the first Chinese benchmark dataset available for irony detection for machine learning models. Ciron includes more than 8.7K posts, collected from Weibo, a micro blogging platform. Most importantly, Ciron is collected with no pre-conditions to ensure a much wider coverage. Evaluation on seven different machine learning classifiers proves the usefulness of Ciron as an important resource for Chinese irony detection.

pdf bib
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Emmanuele Chersoni | Cassandra Jacobs | Yohei Oseki | Laurent Prévot | Enrico Santus
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources
Emmanuele Chersoni | Barry Devereux | Chu-Ren Huang
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources

pdf bib
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
Michael Zock | Emmanuele Chersoni | Alessandro Lenci | Enrico Santus
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon

pdf bib
The CogALex Shared Task on Monolingual and Multilingual Identification of Semantic Relations
Rong Xiang | Emmanuele Chersoni | Luca Iacoponi | Enrico Santus
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon

The shared task of the CogALex-VI workshop focuses on the monolingual and multilingual identification of semantic relations. We provided training and validation data for the following languages: English, German and Chinese. Given a word pair, systems had to be trained to identify which relation holds between them, with possible choices being synonymy, antonymy, hypernymy and no relation at all. Two test sets were released for evaluating the participating systems. One containing pairs for each of the training languages (systems were evaluated in a monolingual fashion) and the other proposing a surprise language to test the crosslingual transfer capabilities of the systems. Among the submitted systems, top performance was achieved by a transformer-based model in both the monolingual and in the multilingual setting, for all the tested languages, proving the potentials of this recently-introduced neural architecture. The shared task description and the results are available at https://sites.google.com/site/cogalexvisharedtask/.

pdf bib
Using Conceptual Norms for Metaphor Detection
Mingyu Wan | Kathleen Ahrens | Emmanuele Chersoni | Menghan Jiang | Qi Su | Rong Xiang | Chu-Ren Huang
Proceedings of the Second Workshop on Figurative Language Processing

This paper reports a linguistically-enriched method of detecting token-level metaphors for the second shared task on Metaphor Detection. We participate in all four phases of competition with both datasets, i.e. Verbs and AllPOS on the VUA and the TOFEL datasets. We use the modality exclusivity and embodiment norms for constructing a conceptual representation of the nodes and the context. Our system obtains an F-score of 0.652 for the VUA Verbs track, which is 5% higher than the strong baselines. The experimental results across models and datasets indicate the salient contribution of using modality exclusivity and modality shift information for predicting metaphoricity.

pdf bib
Automatic Learning of Modality Exclusivity Norms with Crosslingual Word Embeddings
Emmanuele Chersoni | Rong Xiang | Qin Lu | Chu-Ren Huang
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

Collecting modality exclusivity norms for lexical items has recently become a common practice in psycholinguistics and cognitive research. However, these norms are available only for a relatively small number of languages and often involve a costly and time-consuming collection of ratings. In this work, we aim at learning a mapping between word embeddings and modality norms. Our experiments focused on crosslingual word embeddings, in order to predict modality association scores by training on a high-resource language and testing on a low-resource one. We ran two experiments, one in a monolingual and the other one in a crosslingual setting. Results show that modality prediction using off-the-shelf crosslingual embeddings indeed has moderate-to-high correlations with human ratings even when regression algorithms are trained on an English resource and tested on a completely unseen language.

2019

pdf bib
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Emmanuele Chersoni | Cassandra Jacobs | Alessandro Lenci | Tal Linzen | Laurent Prévot | Enrico Santus
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
Distributional Semantics Meets Construction Grammar. towards a Unified Usage-Based Model of Grammar and Meaning
Giulia Rambelli | Emmanuele Chersoni | Philippe Blache | Chu-Ren Huang | Alessandro Lenci
Proceedings of the First International Workshop on Designing Meaning Representations

In this paper, we propose a new type of semantic representation of Construction Grammar that combines constructions with the vector representations used in Distributional Semantics. We introduce a new framework, Distributional Construction Grammar, where grammar and meaning are systematically modeled from language use, and finally, we discuss the kind of contributions that distributional models can provide to CxG representation from a linguistic and cognitive perspective.

pdf bib
PolyU_CBS-CFA at the FinSBD Task: Sentence Boundary Detection of Financial Data with Domain Knowledge Enhancement and Bilingual Training
Mingyu Wan | Rong Xiang | Emmanuele Chersoni | Natalia Klyueva | Kathleen Ahrens | Bin Miao | David Broadstock | Jian Kang | Amos Yung | Chu-Ren Huang
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

2018

pdf bib
BomJi at SemEval-2018 Task 10: Combining Vector-, Pattern- and Graph-based Information to Identify Discriminative Attributes
Enrico Santus | Chris Biemann | Emmanuele Chersoni
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes BomJi, a supervised system for capturing discriminative attributes in word pairs (e.g. yellow as discriminative for banana over watermelon). The system relies on an XGB classifier trained on carefully engineered graph-, pattern- and word embedding-based features. It participated in the SemEval-2018 Task 10 on Capturing Discriminative Attributes, achieving an F1 score of 0.73 and ranking 2nd out of 26 participant systems.

pdf bib
A Rank-Based Similarity Metric for Word Embeddings
Enrico Santus | Hongmin Wang | Emmanuele Chersoni | Yue Zhang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Word Embeddings have recently imposed themselves as a standard for representing word meaning in NLP. Semantic similarity between word pairs has become the most common evaluation benchmark for these representations, with vector cosine being typically used as the only similarity metric. In this paper, we report experiments with a rank-based metric for WE, which performs comparably to vector cosine in similarity estimation and outperforms it in the recently-introduced and challenging task of outlier detection, thus suggesting that rank-based measures can improve clustering quality.

pdf bib
Modeling Violations of Selectional Restrictions with Distributional Semantics
Emmanuele Chersoni | Adrià Torrens Urrutia | Philippe Blache | Alessandro Lenci
Proceedings of the Workshop on Linguistic Complexity and Natural Language Processing

Distributional Semantic Models have been successfully used for modeling selectional preferences in a variety of scenarios, since distributional similarity naturally provides an estimate of the degree to which an argument satisfies the requirement of a given predicate. However, we argue that the performance of such models on rare verb-argument combinations has received relatively little attention: it is not clear whether they are able to distinguish the combinations that are simply atypical, or implausible, from the semantically anomalous ones, and in particular, they have never been tested on the task of modeling their differences in processing complexity. In this paper, we compare two different models of thematic fit by testing their ability of identifying violations of selectional restrictions in two datasets from the experimental studies.

2017

pdf bib
Logical Metonymy in a Distributional Model of Sentence Comprehension
Emmanuele Chersoni | Alessandro Lenci | Philippe Blache
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

In theoretical linguistics, logical metonymy is defined as the combination of an event-subcategorizing verb with an entity-denoting direct object (e.g., The author began the book), so that the interpretation of the VP requires the retrieval of a covert event (e.g., writing). Psycholinguistic studies have revealed extra processing costs for logical metonymy, a phenomenon generally explained with the introduction of new semantic structure. In this paper, we present a general distributional model for sentence comprehension inspired by the Memory, Unification and Control model by Hagoort (2013,2016). We show that our distributional framework can account for the extra processing costs of logical metonymy and can identify the covert event in a classification task.

pdf bib
Is Structure Necessary for Modeling Argument Expectations in Distributional Semantics?
Emmanuele Chersoni | Enrico Santus | Philippe Blache | Alessandro Lenci
IWCS 2017 - 12th International Conference on Computational Semantics - Long papers

pdf bib
Measuring Thematic Fit with Distributional Feature Overlap
Enrico Santus | Emmanuele Chersoni | Alessandro Lenci | Philippe Blache
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we introduce a new distributional method for modeling predicate-argument thematic fit judgments. We use a syntax-based DSM to build a prototypical representation of verb-specific roles: for every verb, we extract the most salient second order contexts for each of its roles (i.e. the most salient dimensions of typical role fillers), and then we compute thematic fit as a weighted overlap between the top features of candidate fillers and role prototypes. Our experiments show that our method consistently outperforms a baseline re-implementing a state-of-the-art system, and achieves better or comparable results to those reported in the literature for the other unsupervised systems. Moreover, it provides an explicit representation of the features characterizing verb-specific semantic roles.

2016

pdf bib
Representing Verbs with Rich Contexts: an Evaluation on Verb Similarity
Emmanuele Chersoni | Enrico Santus | Alessandro Lenci | Philippe Blache | Chu-Ren Huang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Towards a Distributional Model of Semantic Complexity
Emmanuele Chersoni | Philippe Blache | Alessandro Lenci
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

In this paper, we introduce for the first time a Distributional Model for computing semantic complexity, inspired by the general principles of the Memory, Unification and Control framework(Hagoort, 2013; Hagoort, 2016). We argue that sentence comprehension is an incremental process driven by the goal of constructing a coherent representation of the event represented by the sentence. The composition cost of a sentence depends on the semantic coherence of the event being constructed and on the activation degree of the linguistic constructions. We also report the results of a first evaluation of the model on the Bicknell dataset (Bicknell et al., 2010).

pdf bib
CogALex-V Shared Task: ROOT18
Emmanuele Chersoni | Giulia Rambelli | Enrico Santus
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

In this paper, we describe ROOT 18, a classifier using the scores of several unsupervised distributional measures as features to discriminate between semantically related and unrelated words, and then to classify the related pairs according to their semantic relation (i.e. synonymy, antonymy, hypernymy, part-whole meronymy). Our classifier participated in the CogALex-V Shared Task, showing a solid performance on the first subtask, but a poor performance on the second subtask. The low scores reported on the second subtask suggest that distributional measures are not sufficient to discriminate between multiple semantic relations at once.

pdf bib
Testing APSyn against Vector Cosine on Similarity Estimation
Enrico Santus | Emmanuele Chersoni | Alessandro Lenci | Chu-Ren Huang | Philippe Blache
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers