Carlos Gómez-Rodríguez


2020

pdf bib
On the Frailty of Universal POS Tags for Neural UD Parsers
Mark Anderson | Carlos Gómez-Rodríguez
Proceedings of the 24th Conference on Computational Natural Language Learning

We present an analysis on the effect UPOS accuracy has on parsing performance. Results suggest that leveraging UPOS tags as fea-tures for neural parsers requires a prohibitively high tagging accuracy and that the use of gold tags offers a non-linear increase in performance, suggesting some sort of exceptionality. We also investigate what aspects of predicted UPOS tags impact parsing accuracy the most, highlighting some potentially meaningful linguistic facets of the problem.

pdf bib
Enriched In-Order Linearization for Faster Sequence-to-Sequence Constituent Parsing
Daniel Fernández-González | Carlos Gómez-Rodríguez
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Sequence-to-sequence constituent parsing requires a linearization to represent trees as sequences. Top-down tree linearizations, which can be based on brackets or shift-reduce actions, have achieved the best accuracy to date. In this paper, we show that these results can be improved by using an in-order linearization instead. Based on this observation, we implement an enriched in-order shift-reduce linearization inspired by Vinyals et al. (2015)’s approach, achieving the best accuracy to date on the English PTB dataset among fully-supervised single-model sequence-to-sequence constituent parsers. Finally, we apply deterministic attention mechanisms to match the speed of state-of-the-art transition-based parsers, thus showing that sequence-to-sequence models can match them, not only in accuracy, but also in speed.

pdf bib
Transition-based Semantic Dependency Parsing with Pointer Networks
Daniel Fernández-González | Carlos Gómez-Rodríguez
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Transition-based parsers implemented with Pointer Networks have become the new state of the art in dependency parsing, excelling in producing labelled syntactic trees and outperforming graph-based models in this task. In order to further test the capabilities of these powerful neural networks on a harder NLP problem, we propose a transition system that, thanks to Pointer Networks, can straightforwardly produce labelled directed acyclic graphs and perform semantic dependency parsing. In addition, we enhance our approach with deep contextualized word embeddings extracted from BERT. The resulting system not only outperforms all existing transition-based models, but also matches the best fully-supervised accuracy to date on the SemEval 2015 Task 18 datasets among previous state-of-the-art graph-based parsers.

pdf bib
Cross-Lingual Word Embeddings for Turkic Languages
Elmurod Kuriyozov | Yerai Doval | Carlos Gómez-Rodríguez
Proceedings of the 12th Language Resources and Evaluation Conference

There has been an increasing interest in learning cross-lingual word embeddings to transfer knowledge obtained from a resource-rich language, such as English, to lower-resource languages for which annotated data is scarce, such as Turkish, Russian, and many others. In this paper, we present the first viability study of established techniques to align monolingual embedding spaces for Turkish, Uzbek, Azeri, Kazakh and Kyrgyz, members of the Turkic family which is heavily affected by the low-resource constraint. Those techniques are known to require little explicit supervision, mainly in the form of bilingual dictionaries, hence being easily adaptable to different domains, including low-resource ones. We obtain new bilingual dictionaries and new word embeddings for these languages and show the steps for obtaining cross-lingual word embeddings using state-of-the-art techniques. Then, we evaluate the results using the bilingual dictionary induction task. Our experiments confirm that the obtained bilingual dictionaries outperform previously-available ones, and that word embeddings from a low-resource language can benefit from resource-rich closely-related languages when they are aligned together. Furthermore, evaluation on an extrinsic task (Sentiment analysis on Uzbek) proves that monolingual word embeddings can, although slightly, benefit from cross-lingual alignments.

pdf bib
Inherent Dependency Displacement Bias of Transition-Based Algorithms
Mark Anderson | Carlos Gómez-Rodríguez
Proceedings of the 12th Language Resources and Evaluation Conference

A wide variety of transition-based algorithms are currently used for dependency parsers. Empirical studies have shown that performance varies across different treebanks in such a way that one algorithm outperforms another on one treebank and the reverse is true for a different treebank. There is often no discernible reason for what causes one algorithm to be more suitable for a certain treebank and less so for another. In this paper we shed some light on this by introducing the concept of an algorithm’s inherent dependency displacement distribution. This characterises the bias of the algorithm in terms of dependency displacement, which quantify both distance and direction of syntactic relations. We show that the similarity of an algorithm’s inherent distribution to a treebank’s displacement distribution is clearly correlated to the algorithm’s parsing performance on that treebank, specificially with highly significant and substantial correlations for the predominant sentence lengths in Universal Dependency treebanks. We also obtain results which show a more discrete analysis of dependency displacement does not result in any meaningful correlations.

pdf bib
Bracketing Encodings for 2-Planar Dependency Parsing
Michalina Strzyz | David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 28th International Conference on Computational Linguistics

We present a bracketing-based encoding that can be used to represent any 2-planar dependency tree over a sentence of length n as a sequence of n labels, hence providing almost total coverage of crossing arcs in sequence labeling parsing. First, we show that existing bracketing encodings for parsing as labeling can only handle a very mild extension of projective trees. Second, we overcome this limitation by taking into account the well-known property of 2-planarity, which is present in the vast majority of dependency syntactic structures in treebanks, i.e., the arcs of a dependency tree can be split into two planes such that arcs in a given plane do not cross. We take advantage of this property to design a method that balances the brackets and that encodes the arcs belonging to each of those planes, allowing for almost unrestricted non-projectivity (∼99.9% coverage) in sequence labeling parsing. The experiments show that our linearizations improve over the accuracy of the original bracketing encoding in highly non-projective treebanks (on average by 0.4 LAS), while achieving a similar speed. Also, they are especially suitable when PoS tags are not used as input parameters to the models.

pdf bib
A Unifying Theory of Transition-based and Sequence Labeling Parsing
Carlos Gómez-Rodríguez | Michalina Strzyz | David Vilares
Proceedings of the 28th International Conference on Computational Linguistics

We define a mapping from transition-based parsing algorithms that read sentences from left to right to sequence labeling encodings of syntactic trees. This not only establishes a theoretical relation between transition-based parsing and sequence-labeling parsing, but also provides a method to obtain new encodings for fast and simple sequence labeling parsing from the many existing transition-based parsers for different formalisms. Applying it to dependency parsing, we implement sequence labeling versions of four algorithms, showing that they are learnable and obtain comparable performance to existing encodings.

pdf bib
Data Augmentation via Subtree Swapping for Dependency Parsing of Low-Resource Languages
Mathieu Dehouck | Carlos Gómez-Rodríguez
Proceedings of the 28th International Conference on Computational Linguistics

The lack of annotated data is a big issue for building reliable NLP systems for most of the world’s languages. But this problem can be alleviated by automatic data generation. In this paper, we present a new data augmentation method for artificially creating new dependency-annotated sentences. The main idea is to swap subtrees between annotated sentences while enforcing strong constraints on those trees to ensure maximal grammaticality of the new sentences. We also propose a method to perform low-resource experiments using resource-rich languages by mimicking low-resource languages by sampling sentences under a low-resource distribution. In a series of experiments, we show that our newly proposed data augmentation method outperforms previous proposals using the same basic inputs.

pdf bib
Discontinuous Constituent Parsing as Sequence Labeling
David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

This paper reduces discontinuous parsing to sequence labeling. It first shows that existing reductions for constituent parsing as labeling do not support discontinuities. Second, it fills this gap and proposes to encode tree discontinuities as nearly ordered permutations of the input sequence. Third, it studies whether such discontinuous representations are learnable. The experiments show that despite the architectural simplicity, under the right representation, the models are fast and accurate.

pdf bib
Distilling Neural Networks for Greener and Faster Dependency Parsing
Mark Anderson | Carlos Gómez-Rodríguez
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

The carbon footprint of natural language processing research has been increasing in recent years due to its reliance on large and inefficient neural network implementations. Distillation is a network compression technique which attempts to impart knowledge from a large model to a smaller one. We use teacher-student distillation to improve the efficiency of the Biaffine dependency parser which obtains state-of-the-art performance with respect to accuracy and parsing speed (Dozat and Manning, 2017). When distilling to 20% of the original model’s trainable parameters, we only observe an average decrease of ∼1 point for both UAS and LAS across a number of diverse Universal Dependency treebanks while being 2.30x (1.19x) faster than the baseline model on CPU (GPU) at inference time. We also observe a small increase in performance when compressing to 80% for some treebanks. Finally, through distillation we attain a parser which is not only faster but also more accurate than the fastest modern parser on the Penn Treebank.

pdf bib
Efficient EUD Parsing
Mathieu Dehouck | Mark Anderson | Carlos Gómez-Rodríguez
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

We present the system submission from the FASTPARSE team for the EUD Shared Task at IWPT 2020. We engaged with the task by focusing on efficiency. For this we considered training costs and inference efficiency. Our models are a combination of distilled neural dependency parsers and a rule-based system that projects UD trees into EUD graphs. We obtained an average ELAS of 74.04 for our official submission, ranking 4th overall.

pdf bib
Bringing Roguelikes to Visually-Impaired Players by Using NLP
Jesús Vilares | Carlos Gómez-Rodríguez | Luís Fernández-Núñez | Darío Penas | Jorge Viteri
Workshop on Games and Natural Language Processing

Although the roguelike video game genre has a large community of fans (both players and developers) and the graphic aspect of these games is usually given little relevance (ASCII-based graphics are not rare even today), their accessibility for blind players and other visually-impaired users remains a pending issue. In this document, we describe an initiative for the development of roguelikes adapted to visually-impaired players by using Natural Language Processing techniques, together with the first completed games resulting from it. These games were developed as Bachelor’s and Master’s theses. Our approach consists in integrating a multilingual module that, apart from the classic ASCII-based graphical interface, automatically generates text descriptions of what is happening within the game. The visually-impaired user can then read such descriptions by means of a screen reader. In these projects we seek expressivity and variety in the descriptions, so we can offer the users a fun roguelike experience that does not sacrifice any of the key characteristics that define the genre. Moreover, we intend to make these projects easy to extend to other languages, thus avoiding costly and complex solutions. KEYWORDS: Natural Language Generation, roguelikes, visually-impaired users

2019

pdf bib
Towards Making a Dependency Parser See
Michalina Strzyz | David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We explore whether it is possible to leverage eye-tracking data in an RNN dependency parser (for English) when such information is only available during training - i.e. no aggregated or token-level gaze features are used at inference time. To do so, we train a multitask learning model that parses sentences as sequence labeling and leverages gaze features as auxiliary tasks. Our method also learns to train from disjoint datasets, i.e. it can be used to test whether already collected gaze features are useful to improve the performance on new non-gazed annotated treebanks. Accuracy gains are modest but positive, showing the feasibility of the approach. It can serve as a first step towards architectures that can better leverage eye-tracking data or other complementary information available only for training sentences, possibly leading to improvements in syntactic parsing.

pdf bib
HEAD-QA: A Healthcare Dataset for Complex Reasoning
David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. We then consider monolingual (Spanish) and cross-lingual (to English) experiments with information retrieval and neural techniques. We show that: (i) HEAD-QA challenges current methods, and (ii) the results lag well behind human performance, demonstrating its usefulness as a benchmark for future work.

pdf bib
Sequence Labeling Parsing by Learning across Representations
Michalina Strzyz | David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We use parsing as sequence labeling as a common framework to learn across constituency and dependency syntactic abstractions.To do so, we cast the problem as multitask learning (MTL). First, we show that adding a parsing paradigm as an auxiliary loss consistently improves the performance on the other paradigm. Secondly, we explore an MTL sequence labeling model that parses both representations, at almost no cost in terms of performance and speed. The results across the board show that on average MTL models with auxiliary losses for constituency parsing outperform single-task ones by 1.05 F1 points, and for dependency parsing by 0.62 UAS points.

pdf bib
Artificially Evolved Chunks for Morphosyntactic Analysis
Mark Anderson | David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib
Left-to-Right Dependency Parsing with Pointer Networks
Daniel Fernández-González | Carlos Gómez-Rodríguez
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose a novel transition-based algorithm that straightforwardly parses sentences from left to right by building n attachments, with n being the length of the input sentence. Similarly to the recent stack-pointer parser by Ma et al. (2018), we use the pointer network framework that, given a word, can directly point to a position from the sentence. However, our left-to-right approach is simpler than the original top-down stack-pointer parser (not requiring a stack) and reduces transition sequence length in half, from 2n-1 actions to n. This results in a quadratic non-projective parser that runs twice as fast as the original while achieving the best accuracy to date on the English PTB dataset (96.04% UAS, 94.43% LAS) among fully-supervised single-model dependency parsers, and improves over the former top-down transition system in the majority of languages tested.

pdf bib
Viable Dependency Parsing as Sequence Labeling
Michalina Strzyz | David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We recast dependency parsing as a sequence labeling problem, exploring several encodings of dependency trees as labels. While dependency parsing by means of sequence labeling had been attempted in existing work, results suggested that the technique was impractical. We show instead that with a conventional BILSTM-based model it is possible to obtain fast and accurate parsers. These parsers are conceptually simple, not needing traditional parsing algorithms or auxiliary structures. However, experiments on the PTB and a sample of UD treebanks show that they provide a good speed-accuracy tradeoff, with results competitive with more complex approaches.

pdf bib
Harry Potter and the Action Prediction Challenge from Natural Language
David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We explore the challenge of action prediction from textual descriptions of scenes, a testbed to approximate whether text inference can be used to predict upcoming actions. As a case of study, we consider the world of the Harry Potter fantasy novels and inferring what spell will be cast next given a fragment of a story. Spells act as keywords that abstract actions (e.g. ‘Alohomora’ to open a door) and denote a response to the environment. This idea is used to automatically build HPAC, a corpus containing 82,836 samples and 85 actions. We then evaluate different baselines. Among the tested models, an LSTM-based approach obtains the best performance for frequent actions and large scene descriptions, but approaches such as logistic regression behave well on infrequent actions.

2018

pdf bib
A Transition-Based Algorithm for Unrestricted AMR Parsing
David Vilares | Carlos Gómez-Rodríguez
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Non-projective parsing can be useful to handle cycles and reentrancy in AMR graphs. We explore this idea and introduce a greedy left-to-right non-projective transition-based parser. At each parsing configuration, an oracle decides whether to create a concept or whether to connect a pair of existing concepts. The algorithm handles reentrancy and arbitrary cycles natively, i.e. within the transition system itself. The model is evaluated on the LDC2015E86 corpus, obtaining results close to the state of the art, including a Smatch of 64%, and showing good behavior on reentrant edges.

pdf bib
A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing
Daniel Fernández-González | Carlos Gómez-Rodríguez
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We propose an efficient dynamic oracle for training the 2-Planar transition-based parser, a linear-time parser with over 99% coverage on non-projective syntactic corpora. This novel approach outperforms the static training strategy in the vast majority of languages tested and scored better on most datasets than the arc-hybrid parser enhanced with the Swap transition, which can handle unrestricted non-projectivity.

pdf bib
Improving Coverage and Runtime Complexity for Exact Inference in Non-Projective Transition-Based Dependency Parsers
Tianze Shi | Carlos Gómez-Rodríguez | Lillian Lee
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We generalize Cohen, Gómez-Rodríguez, and Satta’s (2011) parser to a family of non-projective transition-based dependency parsers allowing polynomial-time exact inference. This includes novel parsers with better coverage than Cohen et al. (2011), and even a variant that reduces time complexity to O(n6), improving over the known bounds in exact inference for non-projective transition-based parsing. We hope that this piece of theoretical work inspires design of novel transition systems with better coverage and better run-time guarantees.

pdf bib
Non-Projective Dependency Parsing with Non-Local Transitions
Daniel Fernández-González | Carlos Gómez-Rodríguez
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We present a novel transition system, based on the Covington non-projective parser, introducing non-local transitions that can directly create arcs involving nodes to the left of the current focus positions. This avoids the need for long sequences of No-Arcs transitions to create long-distance arcs, thus alleviating error propagation. The resulting parser outperforms the original version and achieves the best accuracy on the Stanford Dependencies conversion of the Penn Treebank among greedy transition-based parsers.

pdf bib
Dynamic Oracles for Top-Down and In-Order Shift-Reduce Constituent Parsing
Daniel Fernández-González | Carlos Gómez-Rodríguez
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We introduce novel dynamic oracles for training two of the most accurate known shift-reduce algorithms for constituent parsing: the top-down and in-order transition-based parsers. In both cases, the dynamic oracles manage to notably increase their accuracy, in comparison to that obtained by performing classic static training. In addition, by improving the performance of the state-of-the-art in-order shift-reduce parser, we achieve the best accuracy to date (92.0 F1) obtained by a fully-supervised single-model greedy shift-reduce constituent parser on the WSJ benchmark.

pdf bib
Constituent Parsing as Sequence Labeling
Carlos Gómez-Rodríguez | David Vilares
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We introduce a method to reduce constituent parsing to sequence labeling. For each word wt, it generates a label that encodes: (1) the number of ancestors in the tree that the words wt and wt+1 have in common, and (2) the nonterminal symbol at the lowest common ancestor. We first prove that the proposed encoding function is injective for any tree without unary branches. In practice, the approach is made extensible to all constituency trees by collapsing unary branches. We then use the PTB and CTB treebanks as testbeds and propose a set of fast baselines. We achieve 90% F-score on the PTB test set, outperforming the Vinyals et al. (2015) sequence-to-sequence parser. In addition, sacrificing some accuracy, our approach achieves the fastest constituent parsing speeds reported to date on PTB by a wide margin.

pdf bib
Global Transition-based Non-projective Dependency Parsing
Carlos Gómez-Rodríguez | Tianze Shi | Lillian Lee
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Shi, Huang, and Lee (2017a) obtained state-of-the-art results for English and Chinese dependency parsing by combining dynamic-programming implementations of transition-based dependency parsers with a minimal set of bidirectional LSTM features. However, their results were limited to projective parsing. In this paper, we extend their approach to support non-projectivity by providing the first practical implementation of the MH₄ algorithm, an O(n4) mildly nonprojective dynamic-programming parser with very high coverage on non-projective treebanks. To make MH₄ compatible with minimal transition-based feature sets, we introduce a transition-based interpretation of it in which parser items are mapped to sequences of transitions. We thus obtain the first implementation of global decoding for non-projective transition-based parsing, and demonstrate empirically that it is effective than its projective counterpart in parsing a number of highly non-projective languages.

pdf bib
Grounding the Semantics of Part-of-Day Nouns Worldwide using Twitter
David Vilares | Carlos Gómez-Rodríguez
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

The usage of part-of-day nouns, such as ‘night’, and their time-specific greetings (‘good night’), varies across languages and cultures. We show the possibilities that Twitter offers for studying the semantics of these terms and its variability between countries. We mine a worldwide sample of multilingual tweets with temporal greetings, and study how their frequencies vary in relation with local time. The results provide insights into the semantics of these temporal expressions and the cultural and sociological factors influencing their usage.

pdf bib
Transition-based Parsing with Lighter Feed-Forward Networks
David Vilares | Carlos Gómez-Rodríguez
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

We explore whether it is possible to build lighter parsers, that are statistically equivalent to their corresponding standard version, for a wide set of languages showing different structures and morphologies. As testbed, we use the Universal Dependencies and transition-based dependency parsers trained on feed-forward networks. For these, most existing research assumes de facto standard embedded features and relies on pre-computation tricks to obtain speed-ups. We explore how these features and their size can be reduced and whether this translates into speed-ups with a negligible impact on accuracy. The experiments show that grand-daughter features can be removed for the majority of treebanks without a significant (negative or positive) LAS difference. They also show how the size of the embeddings can be notably reduced.

2017

pdf bib
A Full Non-Monotonic Transition System for Unrestricted Non-Projective Parsing
Daniel Fernández-González | Carlos Gómez-Rodríguez
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Restricted non-monotonicity has been shown beneficial for the projective arc-eager dependency parser in previous research, as posterior decisions can repair mistakes made in previous states due to the lack of information. In this paper, we propose a novel, fully non-monotonic transition system based on the non-projective Covington algorithm. As a non-monotonic system requires exploration of erroneous actions during the training process, we develop several non-monotonic variants of the recently defined dynamic oracle for the Covington parser, based on tight approximations of the loss. Experiments on datasets from the CoNLL-X and CoNLL-XI shared tasks show that a non-monotonic dynamic oracle outperforms the monotonic version in the majority of languages.

pdf bib
Generic Axiomatization of Families of Noncrossing Graphs in Dependency Parsing
Anssi Yli-Jyrä | Carlos Gómez-Rodríguez
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a simple encoding for unlabeled noncrossing graphs and show how its latent counterpart helps us to represent several families of directed and undirected graphs used in syntactic and semantic parsing of natural language as context-free languages. The families are separated purely on the basis of forbidden patterns in latent encoding, eliminating the need to differentiate the families of non-crossing graphs in inference algorithms: one algorithm works for all when the search space can be controlled in parser input.

pdf bib
A non-projective greedy dependency parser with bidirectional LSTMs
David Vilares | Carlos Gómez-Rodríguez
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The LyS-FASTPARSE team present BIST-COVINGTON, a neural implementation of the Covington (2001) algorithm for non-projective dependency parsing. The bidirectional LSTM approach by Kiperwasser and Goldberg (2016) is used to train a greedy parser with a dynamic oracle to mitigate error propagation. The model participated in the CoNLL 2017 UD Shared Task. In spite of not using any ensemble methods and using the baseline segmentation and PoS tagging, the parser obtained good results on both macro-average LAS and UAS in the big treebanks category (55 languages), ranking 7th out of 33 teams. In the all treebanks category (LAS and UAS) we ranked 16th and 12th. The gap between the all and big categories is mainly due to the poor performance on four parallel PUD treebanks, suggesting that some ‘suffixed’ treebanks (e.g. Spanish-AnCora) perform poorly on cross-treebank settings, which does not occur with the corresponding ‘unsuffixed’ treebank (e.g. Spanish). By changing that, we obtain the 11th best LAS among all runs (official and unofficial). The code is made available at https://github.com/CoNLL-UD-2017/LyS-FASTPARSE

pdf bib
Towards Syntactic Iberian Polarity Classification
David Vilares | Marcos Garcia | Miguel A. Alonso | Carlos Gómez-Rodríguez
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Lexicon-based methods using syntactic rules for polarity classification rely on parsers that are dependent on the language and on treebank guidelines. Thus, rules are also dependent and require adaptation, especially in multilingual scenarios. We tackle this challenge in the context of the Iberian Peninsula, releasing the first symbolic syntax-based Iberian system with rules shared across five official languages: Basque, Catalan, Galician, Portuguese and Spanish. The model is made available.

2016

pdf bib
EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis
David Vilares | Miguel A. Alonso | Carlos Gómez-Rodríguez
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Code-switching texts are those that contain terms in two or more different languages, and they appear increasingly often in social media. The aim of this paper is to provide a resource to the research community to evaluate the performance of sentiment classification techniques on this complex multilingual environment, proposing an English-Spanish corpus of tweets with code-switching (EN-ES-CS CORPUS). The tweets are labeled according to two well-known criteria used for this purpose: SentiStrength and a trinary scale (positive, neutral and negative categories). Preliminary work on the resource is already done, providing a set of baselines for the research community.

pdf bib
One model, two languages: training bilingual parsers with harmonized treebanks
David Vilares | Carlos Gómez-Rodríguez | Miguel A. Alonso
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
LyS at SemEval-2016 Task 4: Exploiting Neural Activation Values for Twitter Sentiment Classification and Quantification
David Vilares | Yerai Doval | Miguel A. Alonso | Carlos Gómez-Rodríguez
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Squibs: Restricted Non-Projectivity: Coverage vs. Efficiency
Carlos Gómez-Rodríguez
Computational Linguistics, Volume 42, Issue 4 - December 2016

2015

pdf bib
Sentiment Analysis on Monolingual, Multilingual and Code-Switching Twitter Corpora
David Vilares | Miguel A. Alonso | Carlos Gómez-Rodríguez
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
LYSGROUP: Adapting a Spanish microtext normalization system to English.
Yerai Doval Mosquera | Jesús Vilares | Carlos Gómez-Rodríguez
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
An Efficient Dynamic Oracle for Unrestricted Non-Projective Parsing
Carlos Gómez-Rodríguez | Daniel Fernández-González
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
LyS: Porting a Twitter Sentiment Analysis Approach from Spanish to English
David Vilares | Miguel Hermo | Miguel A. Alonso | Carlos Gómez-Rodríguez | Yerai Doval
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Language variety identification in Spanish tweets
Wolfgang Maier | Carlos Gómez-Rodríguez
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants

pdf bib
A Polynomial-Time Dynamic Oracle for Non-Projective Dependency Parsing
Carlos Gómez-Rodríguez | Francesco Sartorio | Giorgio Satta
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Divisible Transition Systems and Multiplanar Dependency Parsing
Carlos Gómez-Rodríguez | Joakim Nivre
Computational Linguistics, Volume 39, Issue 4 - December 2013

2012

pdf bib
Dependency Parsing with Undirected Graphs
Carlos Gómez-Rodríguez | Daniel Fernández-González
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Improving Transition-Based Dependency Parsing with Buffer Transitions
Daniel Fernández-González | Carlos Gómez-Rodríguez
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
Shay B. Cohen | Carlos Gómez-Rodríguez | Giorgio Satta
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Dynamic Programming Algorithms for Transition-Based Dependency Parsers
Marco Kuhlmann | Carlos Gómez-Rodríguez | Giorgio Satta
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Dependency Parsing Schemata and Mildly Non-Projective Dependency Parsing
Carlos Gómez-Rodríguez | John Carroll | David Weir
Computational Linguistics, Volume 37, Issue 3 - September 2011

2010

pdf bib
Efficient Parsing of Well-Nested Linear Context-Free Rewriting Systems
Carlos Gómez-Rodríguez | Marco Kuhlmann | Giorgio Satta
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Evaluation of Dependency Parsers on Unbounded Dependencies
Joakim Nivre | Laura Rimell | Ryan McDonald | Carlos Gómez-Rodríguez
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
A Transition-Based Parser for 2-Planar Dependency Structures
Carlos Gómez-Rodríguez | Joakim Nivre
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
An Optimal-Time Binarization Algorithm for Linear Context-Free Rewriting Systems with Fan-Out Two
Carlos Gómez-Rodríguez | Giorgio Satta
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Parsing Mildly Non-Projective Dependency Structures
Carlos Gómez-Rodríguez | David Weir | John Carroll
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Optimal Reduction of Rule Length in Linear Context-Free Rewriting Systems
Carlos Gómez-Rodríguez | Marco Kuhlmann | Giorgio Satta | David Weir
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
A Deductive Approach to Dependency Parsing
Carlos Gómez-Rodríguez | John Carroll | David Weir
Proceedings of ACL-08: HLT

2006

pdf bib
Generating XTAG Parsers from Algebraic Specifications
Carlos Gómez-Rodríguez | Miguel A. Alonso | Manuel Vilares
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms