Marianna Apidianaki


2020

pdf bib
MULTISEM at SemEval-2020 Task 3: Fine-tuning BERT for Lexical Meaning
Aina Garí Soler | Marianna Apidianaki
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present the MULTISEM systems submitted to SemEval 2020 Task 3: Graded Word Similarity in Context (GWSC). We experiment with injecting semantic knowledge into pre-trained BERT models through fine-tuning on lexical semantic tasks related to GWSC. We use existing semantically annotated datasets, and propose to approximate similarity through automatically generated lexical substitutes in context. We participate in both GWSC subtasks and address two languages, English and Finnish. Our best English models occupy the third and fourth positions in the ranking for the two subtasks. Performance is lower for the Finnish models which are mid-ranked in the respective subtasks, highlighting the important role of data availability for fine-tuning.

pdf bib
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
Eneko Agirre | Marianna Apidianaki | Ivan Vulić
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

pdf bib
BERT Knows Punta Cana is not just beautiful, it’s gorgeous: Ranking Scalar Adjectives with Contextualised Representations
Aina Garí Soler | Marianna Apidianaki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Adjectives like pretty, beautiful and gorgeous describe positive properties of the nouns they modify but with different intensity. These differences are important for natural language understanding and reasoning. We propose a novel BERT-based approach to intensity detection for scalar adjectives. We model intensity by vectors directly derived from contextualised representations and show they can successfully rank scalar adjectives. We evaluate our models both intrinsically, on gold standard datasets, and on an Indirect Question Answering task. Our results demonstrate that BERT encodes rich knowledge about the semantics of scalar adjectives, and is able to provide better quality intensity rankings than static embeddings and previous models with access to dedicated resources.

pdf bib
Controlling the Imprint of Passivization and Negation in Contextualized Representations
Hande Celikkanat | Sami Virpioja | Jörg Tiedemann | Marianna Apidianaki
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Contextualized word representations encode rich information about syntax and semantics, alongside specificities of each context of use. While contextual variation does not always reflect actual meaning shifts, it can still reduce the similarity of embeddings for word instances having the same meaning. We explore the imprint of two specific linguistic alternations, namely passivization and negation, on the representations generated by neural models trained with two different objectives: masked language modeling and translation. Our exploration methodology is inspired by an approach previously proposed for removing societal biases from word vectors. We show that passivization and negation leave their traces on the representations, and that neutralizing this information leads to more similar embeddings for words that should preserve their meaning in the transformation. We also find clear differences in how the respective features generalize across datasets.

pdf bib
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics
Iryna Gurevych | Marianna Apidianaki | Manaal Faruqui
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

2019

pdf bib
SUM-QE: a BERT-based Summary Quality Estimation Model
Stratos Xenouleas | Prodromos Malakasiotis | Marianna Apidianaki | Ion Androutsopoulos
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose SUM-QE, a novel Quality Estimation model for summarization based on BERT. The model addresses linguistic quality aspects that are only indirectly captured by content-based approaches to summary evaluation, without involving comparison with human references. SUM-QE achieves very high correlations with human ratings, outperforming simpler models addressing these linguistic aspects. Predictions of the SUM-QE model can be used for system development, and to inform users of the quality of automatically produced summaries and other types of generated text.

pdf bib
A Comparison of Context-sensitive Models for Lexical Substitution
Aina Garí Soler | Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Word embedding representations provide good estimates of word meaning and give state-of-the art performance in semantic tasks. Embedding approaches differ as to whether and how they account for the context surrounding a word. We present a comparison of different word and context representations on the task of proposing substitutes for a target word in context (lexical substitution). We also experiment with tuning contextualized word embeddings on a dataset of sense-specific instances for each target word. We show that powerful contextualized word representations, which give high performance in several semantics-related tasks, deal less well with the subtle in-context similarity relationships needed for substitution. This is better handled by models trained with this objective in mind, where the inter-dependence between word and context representations is explicitly modeled during training.

pdf bib
Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors
Sotiris Kotitsas | Dimitris Pappas | Ion Androutsopoulos | Ryan McDonald | Marianna Apidianaki
Proceedings of the 18th BioNLP Workshop and Shared Task

Network Embedding (NE) methods, which map network nodes to low-dimensional feature vectors, have wide applications in network analysis and bioinformatics. Many existing NE methods rely only on network structure, overlooking other information associated with the nodes, e.g., text describing the nodes. Recent attempts to combine the two sources of information only consider local network structure. We extend NODE2VEC, a well-known NE method that considers broader network structure, to also consider textual node descriptors using recurrent neural encoders. Our method is evaluated on link prediction in two networks derived from UMLS. Experimental results demonstrate the effectiveness of the proposed approach compared to previous work.

pdf bib
LIMSI-MULTISEM at the IJCAI SemDeep-5 WiC Challenge: Context Representations for Word Usage Similarity Estimation
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

pdf bib
Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification
Reno Kriz | João Sedoc | Marianna Apidianaki | Carolina Zheng | Gaurav Kumar | Eleni Miltsakaki | Chris Callison-Burch
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Sentence simplification is the task of rewriting texts so they are easier to understand. Recent research has applied sequence-to-sequence (Seq2Seq) models to this task, focusing largely on training-time improvements via reinforcement learning and memory augmentation. One of the main problems with applying generic Seq2Seq models for simplification is that these models tend to copy directly from the original sentence, resulting in outputs that are relatively long and complex. We aim to alleviate this issue through the use of two main techniques. First, we incorporate content word complexities, as predicted with a leveled word complexity model, into our loss function during training. Second, we generate a large set of diverse candidate simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity. Here, we measure simplicity through a novel sentence complexity model. These extensions allow our models to perform competitively with state-of-the-art systems while generating simpler sentences. We report standard automatic and human evaluation metrics.

pdf bib
Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Usage similarity estimation addresses the semantic proximity of word instances in different contexts. We apply contextualized (ELMo and BERT) word and sentence embeddings to this task, and propose supervised models that leverage these representations for prediction. Our models are further assisted by lexical substitute annotations automatically assigned to word instances by context2vec, a neural model that relies on a bidirectional LSTM. We perform an extensive comparison of existing word and sentence representations on benchmark datasets addressing both graded and binary similarity.The best performing models outperform previous methods in both settings.

pdf bib
Proceedings of the 13th International Workshop on Semantic Evaluation
Jonathan May | Ekaterina Shutova | Aurelie Herbelot | Xiaodan Zhu | Marianna Apidianaki | Saif M. Mohammad
Proceedings of the 13th International Workshop on Semantic Evaluation

2018

pdf bib
Proceedings of The 12th International Workshop on Semantic Evaluation
Marianna Apidianaki | Saif M. Mohammad | Jonathan May | Ekaterina Shutova | Steven Bethard | Marine Carpuat
Proceedings of The 12th International Workshop on Semantic Evaluation

pdf bib
Simplification Using Paraphrases and Context-Based Lexical Substitution
Reno Kriz | Eleni Miltsakaki | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Lexical simplification involves identifying complex words or phrases that need to be simplified, and recommending simpler meaning-preserving substitutes that can be more easily understood. We propose a complex word identification (CWI) model that exploits both lexical and contextual features, and a simplification mechanism which relies on a word-embedding lexical substitution model to replace the detected complex words with simpler paraphrases. We compare our CWI and lexical simplification models to several baselines, and evaluate the performance of our simplification system against human judgments. The results show that our models are able to detect complex words with higher accuracy than other commonly used methods, and propose good simplification substitutes in context. They also highlight the limited contribution of context features for CWI, which nonetheless improve simplification compared to context-unaware models.

pdf bib
Comparing Constraints for Taxonomic Organization
Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Building a taxonomy from the ground up involves several sub-tasks: selecting terms to include, predicting semantic relations between terms, and selecting a subset of relational instances to keep, given constraints on the taxonomy graph. Methods for this final step – taxonomic organization – vary both in terms of the constraints they impose, and whether they enable discovery of synonymous terms. It is hard to isolate the impact of these factors on the quality of the resulting taxonomy because organization methods are rarely compared directly. In this paper, we present a head-to-head comparison of six taxonomic organization algorithms that vary with respect to their structural and transitivity constraints, and treatment of synonymy. We find that while transitive algorithms out-perform their non-transitive counterparts, the top-performing transitive algorithm is prohibitively slow for taxonomies with as few as 50 entities. We propose a simple modification to a non-transitive optimum branching algorithm to explicitly incorporate synonymy, resulting in a method that is substantially faster than the best transitive algorithm while giving complementary performance.

pdf bib
Automated Paraphrase Lattice Creation for HyTER Machine Translation Evaluation
Marianna Apidianaki | Guillaume Wisniewski | Anne Cocos | Chris Callison-Burch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We propose a variant of a well-known machine translation (MT) evaluation metric, HyTER (Dreyer and Marcu, 2012), which exploits reference translations enriched with meaning equivalent expressions. The original HyTER metric relied on hand-crafted paraphrase networks which restricted its applicability to new data. We test, for the first time, HyTER with automatically built paraphrase lattices. We show that although the metric obtains good results on small and carefully curated data with both manually and automatically selected substitutes, it achieves medium performance on much larger and noisier datasets, demonstrating the limits of the metric for tuning and evaluation of current MT systems.

pdf bib
Learning Scalar Adjective Intensity from Paraphrases
Anne Cocos | Skyler Wharton | Ellie Pavlick | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Adjectives like “warm”, “hot”, and “scalding” all describe temperature but differ in intensity. Understanding these differences between adjectives is a necessary part of reasoning about natural language. We propose a new paraphrase-based method to automatically learn the relative intensity relation that holds between a pair of scalar adjectives. Our approach analyzes over 36k adjectival pairs from the Paraphrase Database under the assumption that, for example, paraphrase pair “really hot” <–> “scalding” suggests that “hot” < “scalding”. We show that combining this paraphrase evidence with existing, complementary pattern- and lexicon-based approaches improves the quality of systems for automatically ordering sets of scalar adjectives and inferring the polarity of indirect answers to “yes/no” questions.

pdf bib
Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package
Ajay Patel | Alexander Sands | Chris Callison-Burch | Marianna Apidianaki
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Vector space embedding models like word2vec, GloVe, and fastText are extremely popular representations in natural language processing (NLP) applications. We present Magnitude, a fast, lightweight tool for utilizing and processing embeddings. Magnitude is an open source Python package with a compact vector storage file format that allows for efficient manipulation of huge numbers of embeddings. Magnitude performs common operations up to 60 to 6,000 times faster than Gensim. Magnitude introduces several novel features for improved robustness like out-of-vocabulary lookups.

pdf bib
A comparative study of word embeddings and other features for lexical complexity detection in French
Aina Garí Soler | Marianna Apidianaki | Alexandre Allauzen
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Lexical complexity detection is an important step for automatic text simplification which serves to make informed lexical substitutions. In this study, we experiment with word embeddings for measuring the complexity of French words and combine them with other features that have been shown to be well-suited for complexity prediction. Our results on a synonym ranking task show that embeddings perform better than other features in isolation, but do not outperform frequency-based systems in this language.

2017

pdf bib
Learning Antonyms with Paraphrases and a Morphology-Aware Neural Network
Sneha Rajana | Chris Callison-Burch | Marianna Apidianaki | Vered Shwartz
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

Recognizing and distinguishing antonyms from other types of semantic relations is an essential part of language understanding systems. In this paper, we present a novel method for deriving antonym pairs using paraphrase pairs containing negation markers. We further propose a neural network model, AntNET, that integrates morphological features indicative of antonymy into a path-based relation detection algorithm. We demonstrate that our model outperforms state-of-the-art models in distinguishing antonyms from other semantic relations and is capable of efficiently handling multi-word expressions.

pdf bib
Mapping the Paraphrase Database to WordNet
Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

WordNet has facilitated important research in natural language processing but its usefulness is somewhat limited by its relatively small lexical coverage. The Paraphrase Database (PPDB) covers 650 times more words, but lacks the semantic structure of WordNet that would make it more directly useful for downstream tasks. We present a method for mapping words from PPDB to WordNet synsets with 89% accuracy. The mapping also lays important groundwork for incorporating WordNet’s relations into PPDB so as to increase its utility for semantic reasoning in applications.

pdf bib
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Steven Bethard | Marine Carpuat | Marianna Apidianaki | Saif M. Mohammad | Daniel Cer | David Jurgens
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

pdf bib
Word Sense Filtering Improves Embedding-Based Lexical Substitution
Anne Cocos | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

The role of word sense disambiguation in lexical substitution has been questioned due to the high performance of vector space models which propose good substitutes without explicitly accounting for sense. We show that a filtering mechanism based on a sense inventory optimized for substitutability can improve the results of these models. Our sense inventory is constructed using a clustering method which generates paraphrase clusters that are congruent with lexical substitution annotations in a development set. The results show that lexical substitution can still benefit from senses which can improve the output of vector space paraphrase ranking models.

pdf bib
Learning Translations via Matrix Completion
Derry Tanti Wijaya | Brendan Callahan | John Hewitt | Jie Gao | Xiao Ling | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Bilingual Lexicon Induction is the task of learning word translations without bilingual parallel corpora. We model this task as a matrix completion problem, and present an effective and extendable framework for completing the matrix. This method harnesses diverse bilingual and monolingual signals, each of which may be incomplete or noisy. Our model achieves state-of-the-art performance for both high and low resource languages.

pdf bib
KnowYourNyms? A Game of Semantic Relationships
Ross Mechanic | Dean Fulgoni | Hannah Cutler | Sneha Rajana | Zheyuan Liu | Bradley Jackson | Anne Cocos | Chris Callison-Burch | Marianna Apidianaki
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Semantic relation knowledge is crucial for natural language understanding. We introduce “KnowYourNyms?”, a web-based game for learning semantic relations. While providing users with an engaging experience, the application collects large amounts of data that can be used to improve semantic relation classifiers. The data also broadly informs us of how people perceive the relationships between words, providing useful insights for research in psychology and linguistics.

2016

pdf bib
Datasets for Aspect-Based Sentiment Analysis in French
Marianna Apidianaki | Xavier Tannier | Cécile Richart
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Aspect Based Sentiment Analysis (ABSA) is the task of mining and summarizing opinions from text about specific entities and their aspects. This article describes two datasets for the development and testing of ABSA systems for French which comprise user reviews annotated with relevant entities, aspects and polarity values. The first dataset contains 457 restaurant reviews (2365 sentences) for training and testing ABSA systems, while the second contains 162 museum reviews (655 sentences) dedicated to out-of-domain evaluation. Both datasets were built as part of SemEval-2016 Task 5 “Aspect-Based Sentiment Analysis” where seven different languages were represented, and are publicly available for research purposes.

pdf bib
Vector-space models for PPDB paraphrase ranking in context
Marianna Apidianaki
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
TransRead: Designing a Bilingual Reading Experience with Machine Translation Technologies
François Yvon | Yong Xu | Marianna Apidianaki | Clément Pillias | Pierre Cubaud
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
Proceedings of ACL-2016 System Demonstrations
Sameer Pradhan | Marianna Apidianaki
Proceedings of ACL-2016 System Demonstrations

pdf bib
Lecture bilingue augmentée par des alignements multi-niveaux (Augmenting bilingual reading with alignment information)
François Yvon | Yong Xu | Marianna Apidianaki | Clément Pillias | Cubaud Pierre
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations

Le travail qui a conduit à cette démonstration combine des outils de traitement des langues multilingues, en particulier l’alignement automatique, avec des techniques de visualisation et d’interaction. Il vise à proposer des pistes pour le développement d’outils permettant de lire simultanément les différentes versions d’un texte disponible en plusieurs langues, avec des applications en lecture de loisir ou en lecture professionnelle.

pdf bib
SemEval-2016 Task 5: Aspect Based Sentiment Analysis
Maria Pontiki | Dimitris Galanis | Haris Papageorgiou | Ion Androutsopoulos | Suresh Manandhar | Mohammad AL-Smadi | Mahmoud Al-Ayyoub | Yanyan Zhao | Bing Qin | Orphée De Clercq | Véronique Hoste | Marianna Apidianaki | Xavier Tannier | Natalia Loukachevitch | Evgeniy Kotelnikov | Nuria Bel | Salud María Jiménez-Zafra | Gülşen Eryiğit
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Word Sense Clustering and Clusterability
Diana McCarthy | Marianna Apidianaki | Katrin Erk
Computational Linguistics, Volume 42, Issue 2 - June 2016

2015

pdf bib
METEOR-WSD: Improved Sense Matching in MT Evaluation
Marianna Apidianaki | Benjamin Marie
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Alignment-based sense selection in METEOR and the RATATOUILLE recipe
Benjamin Marie | Marianna Apidianaki
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
LIMSI: Translations as Source of Indirect Supervision for Multilingual All-Words Sense Disambiguation and Entity Linking
Marianna Apidianaki | Li Gong
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Global Methods for Cross-lingual Semantic Role and Predicate Labelling
Lonneke van der Plas | Marianna Apidianaki | Chenhua Chen
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Cross-lingual Word Sense Disambiguation for Predicate Labelling of French
Lonneke van der Plas | Marianna Apidianaki
Proceedings of TALN 2014 (Volume 1: Long Papers)

pdf bib
Semantic Clustering of Pivot Paraphrases
Marianna Apidianaki | Emilia Verzeni | Diana McCarthy
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Paraphrases extracted from parallel corpora by the pivot method (Bannard and Callison-Burch, 2005) constitute a valuable resource for multilingual NLP applications. In this study, we analyse the semantics of unigram pivot paraphrases and use a graph-based sense induction approach to unveil hidden sense distinctions in the paraphrase sets. The comparison of the acquired senses to gold data from the Lexical Substitution shared task (McCarthy and Navigli, 2007) demonstrates that sense distinctions exist in the paraphrase sets and highlights the need for a disambiguation step in applications using this resource.

2013

pdf bib
Cross-lingual WSD for Translation Extraction from Comparable Corpora
Marianna Apidianaki | Nikola Ljubešić | Darja Fišer
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

pdf bib
LIMSI : Cross-lingual Word Sense Disambiguation using Translation Sense Clustering
Marianna Apidianaki
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Applying cross-lingual WSD to wordnet development
Marianna Apidianaki | Benoît Sagot
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The automatic development of semantic resources constitutes an important challenge in the NLP community. The methods used generally exploit existing large-scale resources, such as Princeton WordNet, often combined with information extracted from multilingual resources and parallel corpora. In this paper we show how Cross-Lingual Word Sense Disambiguation can be applied to wordnet development. We apply the proposed method to WOLF, a free wordnet for French still under construction, in order to fill synsets that did not contain any literal yet and increase its coverage.

pdf bib
Boosting the Coverage of a Semantic Lexicon by Automatically Extracted Event Nominalizations
Kata Gábor | Marianna Apidianaki | Benoît Sagot | Éric Villemonte de La Clergerie
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this article, we present a distributional analysis method for extracting nominalization relations from monolingual corpora. The acquisition method makes use of distributional and morphological information to select nominalization candidates. We explain how the learning is performed on a dependency annotated corpus and describe the nominalization results. Furthermore, we show how these results served to enrich an existing lexical resource, the WOLF (Wordnet Libre du Franc¸ais). We present the techniques that we developed in order to integrate the new information into WOLF, based on both its structure and content. Finally, we evaluate the validity of the automatically obtained information and the correctness of its integration into the semantic resource. The method proved to be useful for boosting the coverage of WOLF and presents the advantage of filling verbal synsets, which are particularly difficult to handle due to the high level of verbal polysemy.

pdf bib
LIMSI @ WMT12
Hai-Son Le | Thomas Lavergne | Alexandre Allauzen | Marianna Apidianaki | Li Gong | Aurélien Max | Artem Sokolov | Guillaume Wisniewski | François Yvon
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages
Marianna Apidianaki | Ido Dagan | Jennifer Foster | Yuval Marton | Djamé Seddah | Reut Tsarfaty
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages

pdf bib
WSD for n-best reranking and local language modeling in SMT
Marianna Apidianaki | Guillaume Wisniewski | Artem Sokolov | Aurélien Max | François Yvon
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Measuring the Adequacy of Cross-Lingual Paraphrases in a Machine Translation Setting
Marianna Apidianaki
Proceedings of COLING 2012: Posters

2011

pdf bib
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Dekai Wu | Marianna Apidianaki | Marine Carpuat | Lucia Specia
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Unsupervised Cross-Lingual Lexical Substitution
Marianna Apidianaki
Proceedings of the First workshop on Unsupervised Learning in NLP

pdf bib
Latent Semantic Word Sense Induction and Disambiguation
Tim Van de Cruys | Marianna Apidianaki
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2009

pdf bib
Capturing Lexical Variation in MT Evaluation Using Automatically Built Sense-Cluster Inventories
Marianna Apidianaki | Yifan He | Andy Way
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1

pdf bib
Data-Driven Semantic Analysis for Multilingual WSD and Lexical Selection in Translation
Marianna Apidianaki
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

2008

pdf bib
Translation-oriented Word Sense Induction Based on Parallel Corpora
Marianna Apidianaki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Word Sense Disambiguation (WSD) is an intermediate task that serves as a means to an end defined by the application in which it is to be used. However, different applications have varying disambiguation needs which should have an impact on the choice of the method and of the sense inventory used. The tendency towards application-oriented WSD becomes more and more evident, mostly because of the inadequacy of predefined sense inventories and the inefficacy of application-independent methods in accomplishing specific tasks. In this article, we present a data-driven method of sense induction, which combines contextual and translation information coming from a bilingual parallel training corpus. It consists of an unsupervised method that clusters semantically similar translation equivalents of source language (SL) polysemous words. The created clusters are projected on the SL words revealing their sense distinctions. Clustered equivalents describing a sense of a polysemous word can be considered as more or less commutable translations for an instance of the word carrying this sense. The resulting sense clusters can thus be used for WSD and sense annotation, as well as for lexical selection in translation applications.
Search
Co-authors