Mohammad Taher Pilehvar


2020

pdf bib
Will-They-Won’t-They: A Very Large Dataset for Stance Detection on Twitter
Costanza Conforti | Jakob Berndt | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present a new challenging stance detection dataset, called Will-They-Won’t-They (WT--WT), which contains 51,284 tweets in English, making it by far the largest available dataset of the type. All the annotations are carried out by experts; therefore, the dataset constitutes a high-quality and reliable benchmark for future research in stance detection. Our experiments with a wide range of recent state-of-the-art stance detection systems show that the dataset poses a strong challenge to existing models in this domain.

pdf bib
SemEval-2020 Task 3: Graded Word Similarity in Context
Carlos Santos Armendariz | Matthew Purver | Senja Pollak | Nikola Ljubešić | Matej Ulčar | Ivan Vulić | Mohammad Taher Pilehvar
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents the Graded Word Similarity in Context (GWSC) task which asked participants to predict the effects of context on human perception of similarity in English, Croatian, Slovene and Finnish. We received 15 submissions and 11 system description papers. A new dataset (CoSimLex) was created for evaluation in this task: it contains pairs of words, each annotated within two different contexts. Systems beat the baselines by significant margins, but few did well in more than one language or subtask. Almost every system employed a Transformer model, but with many variations in the details: WordNet sense embeddings, translation of contexts, TF-IDF weightings, and the automatic creation of datasets for fine-tuning were all used to good effect.

pdf bib
STANDER: An Expert-Annotated Dataset for News Stance Detection and Evidence Retrieval
Costanza Conforti | Jakob Berndt | Mohammad Taher Pilehvar | Chryssi Giannitsarou | Flavio Toxvaerd | Nigel Collier
Findings of the Association for Computational Linguistics: EMNLP 2020

We present a new challenging news dataset that targets both stance detection (SD) and fine-grained evidence retrieval (ER). With its 3,291 expert-annotated articles, the dataset constitutes a high-quality benchmark for future research in SD and multi-task learning. We provide a detailed description of the corpus collection methodology and carry out an extensive analysis on the sources of disagreement between annotators, observing a correlation between their disagreement and the diffusion of uncertainty around a target in the real world. Our experiments show that the dataset poses a strong challenge to recent state-of-the-art models. Notably, our dataset aligns with an existing Twitter SD dataset: their union thus addresses a key shortcoming of previous works, by providing the first dedicated resource to study multi-genre SD as well as the interplay of signals from social media and news sources in rumour verification.

pdf bib
Embeddings in Natural Language Processing
Jose Camacho-Collados | Mohammad Taher Pilehvar
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

Embeddings have been one of the most important topics of interest in NLP for the past decade. Representing knowledge through a low-dimensional vector which is easily integrable in modern machine learning models has played a central role in the development of the field. Embedding techniques initially focused on words but the attention soon started to shift to other forms. This tutorial will provide a high-level synthesis of the main embedding techniques in NLP, in the broad sense. We will start by conventional word embeddings (e.g., Word2Vec and GloVe) and then move to other types of embeddings, such as sense-specific and graph alternatives. We will finalize with an overview of the trending contextualized representations (e.g., ELMo and BERT) and explain their potential and impact in NLP.

pdf bib
XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization
Alessandro Raganato | Tommaso Pasini | Jose Camacho-Collados | Mohammad Taher Pilehvar
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The ability to correctly model distinct meanings of a word is crucial for the effectiveness of semantic representation techniques. However, most existing evaluation benchmarks for assessing this criterion are tied to sense inventories (usually WordNet), restricting their usage to a small subset of knowledge-based representation techniques. The Word-in-Context dataset (WiC) addresses the dependence on sense inventories by reformulating the standard disambiguation task as a binary classification problem; but, it is limited to the English language. We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages from varied language families and with different degrees of resource availability, opening room for evaluation scenarios such as zero-shot cross-lingual transfer. We perform a series of experiments to determine the reliability of the datasets and to set performance baselines for several recent contextualized multilingual models. Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance in the task of distinguishing different meanings of a word, even for distant languages. XL-WiC is available at https://pilehvar.github.io/xlwic/.

2019

pdf bib
On the Importance of the Kullback-Leibler Divergence Term in Variational Autoencoders for Text Generation
Victor Prokhorov | Ehsan Shareghi | Yingzhen Li | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 3rd Workshop on Neural Generation and Translation

Variational Autoencoders (VAEs) are known to suffer from learning uninformative latent representation of the input due to issues such as approximated posterior collapse, or entanglement of the latent space. We impose an explicit constraint on the Kullback-Leibler (KL) divergence term inside the VAE objective function. While the explicit constraint naturally avoids posterior collapse, we use it to further understand the significance of the KL term in controlling the information transmitted through the VAE channel. Within this framework, we explore different properties of the estimated posterior distribution, and highlight the trade-off between the amount of information encoded in a latent code during training, and the generative capacity of the model.

pdf bib
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)
Luis Espinosa-Anke | Thierry Declerck | Dagmar Gromann | Jose Camacho-Collados | Mohammad Taher Pilehvar
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

pdf bib
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
Mohammad Taher Pilehvar | Jose Camacho-Collados
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

By design, word embeddings are unable to model the dynamic nature of words’ semantics, i.e., the property of words to correspond to potentially different meanings. To address this limitation, dozens of specialized meaning representation techniques such as sense or contextualized embeddings have been proposed. However, despite the popularity of research on this topic, very few evaluation benchmarks exist that specifically focus on the dynamic semantics of words. In this paper we show that existing models have surpassed the performance ceiling of the standard evaluation dataset for the purpose, i.e., Stanford Contextual Word Similarity, and highlight its shortcomings. To address the lack of a suitable benchmark, we put forward a large-scale Word in Context dataset, called WiC, based on annotations curated by experts, for generic evaluation of context-sensitive representations. WiC is released in https://pilehvar.github.io/wic/.

pdf bib
Generating Knowledge Graph Paths from Textual Definitions using Sequence-to-Sequence Models
Victor Prokhorov | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present a novel method for mapping unrestricted text to knowledge graph entities by framing the task as a sequence-to-sequence problem. Specifically, given the encoded state of an input text, our decoder directly predicts paths in the knowledge graph, starting from the root and ending at the the target node following hypernym-hyponym relationships. In this way, and in contrast to other text-to-entity mapping systems, our model outputs hierarchically structured predictions that are fully interpretable in the context of the underlying ontology, in an end-to-end manner. We present a proof-of-concept experiment with encouraging results, comparable to those of state-of-the-art systems.

pdf bib
On the Importance of Distinguishing Word Meaning Representations: A Case Study on Reverse Dictionary Mapping
Mohammad Taher Pilehvar
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Meaning conflation deficiency is one of the main limiting factors of word representations which, given their widespread use at the core of many NLP systems, can lead to inaccurate semantic understanding of the input text and inevitably hamper the performance. Sense representations target this problem. However, their potential impact has rarely been investigated in downstream NLP applications. Through a set of experiments on a state-of-the-art reverse dictionary system based on neural networks, we show that a simple adjustment aimed at addressing the meaning conflation deficiency can lead to substantial improvements.

2018

pdf bib
The interplay between lexical resources and Natural Language Processing
Jose Camacho-Collados | Luis Espinosa Anke | Mohammad Taher Pilehvar
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Incorporating linguistic, world and common sense knowledge into AI/NLP systems is currently an important research area, with several open problems and challenges. At the same time, processing and storing this knowledge in lexical resources is not a straightforward task. We propose to address these complementary goals from two methodological perspectives: the use of NLP methods to help the process of constructing and enriching lexical resources and the use of lexical resources for improving NLP applications. This tutorial may be useful for two main types of audience: those working on language resources who are interested in becoming acquainted with automatic NLP techniques, with the end goal of speeding and/or easing up the process of resource curation; and on the other hand, researchers in NLP who would like to benefit from the knowledge of lexical resources to improve their systems and models.

pdf bib
Card-660: Cambridge Rare Word Dataset - a Reliable Benchmark for Infrequent Word Representation Models
Mohammad Taher Pilehvar | Dimitri Kartsaklis | Victor Prokhorov | Nigel Collier
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Rare word representation has recently enjoyed a surge of interest, owing to the crucial role that effective handling of infrequent words can play in accurate semantic understanding. However, there is a paucity of reliable benchmarks for evaluation and comparison of these techniques. We show in this paper that the only existing benchmark (the Stanford Rare Word dataset) suffers from low-confidence annotations and limited vocabulary; hence, it does not constitute a solid comparison framework. In order to fill this evaluation gap, we propose Cambridge Rare word Dataset (Card-660), an expert-annotated word similarity dataset which provides a highly reliable, yet challenging, benchmark for rare word representation techniques. Through a set of experiments we show that even the best mainstream word embeddings, with millions of words in their vocabularies, are unable to achieve performances higher than 0.43 (Pearson correlation) on the dataset, compared to a human-level upperbound of 0.90. We release the dataset and the annotation materials at https://pilehvar.github.io/card-660/.

pdf bib
Mapping Text to Knowledge Graph Entities using Multi-Sense LSTMs
Dimitri Kartsaklis | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper addresses the problem of mapping natural language text to knowledge base entities. The mapping process is approached as a composition of a phrase or a sentence into a point in a multi-dimensional entity space obtained from a knowledge graph. The compositional model is an LSTM equipped with a dynamic disambiguation mechanism on the input word embeddings (a Multi-Sense LSTM), addressing polysemy issues. Further, the knowledge base space is prepared by collecting random walks from a graph enhanced with textual features, which act as a set of semantic bridges between text and knowledge base entities. The ideas of this work are demonstrated on large-scale text-to-entity mapping and entity classification tasks, with state of the art results.

pdf bib
Large-scale Exploration of Neural Relation Classification Architectures
Hoang-Quynh Le | Duy-Cat Can | Sinh T. Vu | Thanh Hai Dang | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Experimental performance on the task of relation classification has generally improved using deep neural network architectures. One major drawback of reported studies is that individual models have been evaluated on a very narrow range of datasets, raising questions about the adaptability of the architectures, while making comparisons between approaches difficult. In this work, we present a systematic large-scale analysis of neural relation classification architectures on six benchmark datasets with widely varying characteristics. We propose a novel multi-channel LSTM model combined with a CNN that takes advantage of all currently popular linguistic and architectural features. Our ‘Man for All Seasons’ approach achieves state-of-the-art performance on two datasets. More importantly, in our view, the model allowed us to obtain direct insights into the continued challenges faced by neural language models on this task.

pdf bib
Which Melbourne? Augmenting Geocoding with Maps
Milan Gritta | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The purpose of text geolocation is to associate geographic information contained in a document with a set (or sets) of coordinates, either implicitly by using linguistic features and/or explicitly by using geographic metadata combined with heuristics. We introduce a geocoder (location mention disambiguator) that achieves state-of-the-art (SOTA) results on three diverse datasets by exploiting the implicit lexical clues. Moreover, we propose a new method for systematic encoding of geographic metadata to generate two distinct views of the same text. To that end, we introduce the Map Vector (MapVec), a sparse representation obtained by plotting prior geographic probabilities, derived from population figures, on a World Map. We then integrate the implicit (language) and explicit (map) features to significantly improve a range of metrics. We also introduce an open-source dataset for geoparsing of news events covering global disease outbreaks and epidemics to help future evaluation in geoparsing.

pdf bib
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis
Jose Camacho-Collados | Mohammad Taher Pilehvar
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep learning literature. In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier. We perform an extensive evaluation on standard benchmarks from text categorization and sentiment analysis. While our experiments show that a simple tokenization of input text is generally adequate, they also highlight significant degrees of variability across preprocessing techniques. This reveals the importance of paying attention to this usually-overlooked step in the pipeline, particularly when comparing different models. Finally, our evaluation provides insights into the best preprocessing practices for training word embeddings.

pdf bib
Towards Automatic Fake News Detection: Cross-Level Stance Detection in News Articles
Costanza Conforti | Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)

In this paper, we propose to adapt the four-staged pipeline proposed by Zubiaga et al. (2018) for the Rumor Verification task to the problem of Fake News Detection. We show that the recently released FNC-1 corpus covers two of its steps, namely the Tracking and the Stance Detection task. We identify asymmetry in length in the input to be a key characteristic of the latter step, when adapted to the framework of Fake News Detection, and propose to handle it as a specific type of Cross-Level Stance Detection. Inspired by theories from the field of Journalism Studies, we implement and test two architectures to successfully model the internal structure of an article and its interactions with a claim.

2017

pdf bib
Vancouver Welcomes You! Minimalist Location Metonymy Resolution
Milan Gritta | Mohammad Taher Pilehvar | Nut Limsopatham | Nigel Collier
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Named entities are frequently used in a metonymic manner. They serve as references to related entities such as people and organisations. Accurate identification and interpretation of metonymy can be directly beneficial to various NLP applications, such as Named Entity Recognition and Geographical Parsing. Until now, metonymy resolution (MR) methods mainly relied on parsers, taggers, dictionaries, external word lists and other handcrafted lexical resources. We show how a minimalist neural approach combined with a novel predicate window method can achieve competitive results on the SemEval 2007 task on Metonymy Resolution. Additionally, we contribute with a new Wikipedia-based MR dataset called RelocaR, which is tailored towards locations as well as improving previous deficiencies in annotation guidelines.

pdf bib
Towards a Seamless Integration of Word Senses into Downstream NLP Applications
Mohammad Taher Pilehvar | Jose Camacho-Collados | Roberto Navigli | Nigel Collier
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration of sense-level information into NLP systems has remained understudied. By incorporating a novel disambiguation algorithm into a state-of-the-art classification model, we create a pipeline to integrate sense-level information into downstream NLP applications. We show that a simple disambiguation of the input text can lead to consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when the fine granularity of the underlying sense inventory is reduced and the document is sufficiently large. Our results also point to the need for sense representation research to focus more on in vivo evaluations which target the performance in downstream NLP applications rather than artificial benchmarks.

pdf bib
Inducing Embeddings for Rare and Unseen Words by Leveraging Lexical Resources
Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We put forward an approach that exploits the knowledge encoded in lexical resources in order to induce representations for words that were not encountered frequently during training. Our approach provides an advantage over the past work in that it enables vocabulary expansion not only for morphological variations, but also for infrequent domain specific terms. We performed evaluations in different settings, showing that the technique can provide consistent improvements on multiple benchmarks across domains.

pdf bib
Word Vector Space Specialisation
Ivan Vulić | Nikola Mrkšić | Mohammad Taher Pilehvar
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Specialising vector spaces to maximise their content with respect to one key property of vector space models (e.g. semantic similarity vs. relatedness or lexical entailment) while mitigating others has become an active and attractive research topic in representation learning. Such specialised vector spaces support different classes of NLP problems. Proposed approaches fall into two broad categories: a) Unsupervised methods which learn from raw textual corpora in more sophisticated ways (e.g. using context selection, extracting co-occurrence information from word patterns, attending over contexts); and b) Knowledge-base driven approaches which exploit available resources to encode external information into distributional vector spaces, injecting knowledge from semantic lexicons (e.g., WordNet, FrameNet, PPDB). In this tutorial, we will introduce researchers to state-of-the-art methods for constructing vector spaces specialised for a broad range of downstream NLP applications. We will deliver a detailed survey of the proposed methods and discuss best practices for intrinsic and application-oriented evaluation of such vector spaces.Throughout the tutorial, we will provide running examples reaching beyond English as the only (and probably the easiest) use-case language, in order to demonstrate the applicability and modelling challenges of current representation learning architectures in other languages.

pdf bib
SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity
Jose Camacho-Collados | Mohammad Taher Pilehvar | Nigel Collier | Roberto Navigli
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper introduces a new task on Multilingual and Cross-lingual SemanticThis paper introduces a new task on Multilingual and Cross-lingual Semantic Word Similarity which measures the semantic similarity of word pairs within and across five languages: English, Farsi, German, Italian and Spanish. High quality datasets were manually curated for the five languages with high inter-annotator agreements (consistently in the 0.9 ballpark). These were used for semi-automatic construction of ten cross-lingual datasets. 17 teams participated in the task, submitting 24 systems in subtask 1 and 14 systems in subtask 2. Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks. More information can be found on the task website: http://alt.qcri.org/semeval2017/task2/

pdf bib
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
Jose Camacho-Collados | Mohammad Taher Pilehvar
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

2016

pdf bib
De-Conflated Semantic Representations
Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Embeddings for Word Sense Disambiguation: An Evaluation Study
Ignacio Iacobacci | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Improved Semantic Representation for Domain-Specific Entities
Mohammad Taher Pilehvar | Nigel Collier
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

pdf bib
SemEval-2016 Task 14: Semantic Taxonomy Enrichment
David Jurgens | Mohammad Taher Pilehvar
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

bib
Semantic Similarity Frontiers: From Concepts to Documents
David Jurgens | Mohammad Taher Pilehvar
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Semantic similarity forms a central component in many NLP systems, from lexical semantics, to part of speech tagging, to social media analysis. Recent years have seen a renewed interest in developing new similarity techniques, buoyed in part by work on embeddings and by SemEval tasks in Semantic Textual Similarity and Cross-Level Semantic Similarity. The increased interest has led to hundreds of techniques for measuring semantic similarity, which makes it difficult for practitioners to identify which state-of-the-art techniques are applicable and easily integrated into projects and for researchers to identify which aspects of the problem require future research.This tutorial synthesizes the current state of the art for measuring semantic similarity for all types of conceptual or textual pairs and presents a broad overview of current techniques, what resources they use, and the particular inputs or domains to which the methods are most applicable. We survey methods ranging from corpus-based approaches operating on massive or domains-specific corpora to those leveraging structural information from expert-based or collaboratively-constructed lexical resources. Furthermore, we review work on multiple similarity tasks from sense-based comparisons to word, sentence, and document-sized comparisons and highlight general-purpose methods capable of comparing multiple types of inputs. Where possible, we also identify techniques that have been demonstrated to successfully operate in multilingual or cross-lingual settings.Our tutorial provides a clear overview of currently-available tools and their strengths for practitioners who need out of the box solutions and provides researchers with an understanding of the limitations of current state of the art and what open problems remain in the field. Given the breadth of available approaches, participants will also receive a detailed bibliography of approaches (including those not directly covered in the tutorial), annotated according to the approaches abilities, and pointers to when open-source implementations of the algorithms may be obtained.

pdf bib
NASARI: a Novel Approach to a Semantically-Aware Representation of Items
José Camacho-Collados | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Reserating the awesometastic: An automatic extension of the WordNet taxonomy for novel terms
David Jurgens | Mohammad Taher Pilehvar
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
An Open-source Framework for Multi-level Semantic Similarity Measurement
Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
SensEmbed: Learning Sense Embeddings for Word and Relational Similarity
Ignacio Iacobacci | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
A Unified Multilingual Semantic Representation of Concepts
José Camacho-Collados | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets
José Camacho-Collados | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
SemEval-2014 Task 3: Cross-Level Semantic Similarity
David Jurgens | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
A Large-Scale Pseudoword-Based Evaluation Framework for State-of-the-Art Word Sense Disambiguation
Mohammad Taher Pilehvar | Roberto Navigli
Computational Linguistics, Volume 40, Issue 4 - December 2014

pdf bib
A Robust Approach to Aligning Heterogeneous Lexical Resources
Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
Mohammad Taher Pilehvar | David Jurgens | Roberto Navigli
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Paving the Way to a Large-scale Pseudosense-annotated Dataset
Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies