Kepa Sarasola

Also published as: K. Sarasola, K Sarasola


2018

pdf bib
Konbitzul: an MWE-specific database for Spanish-Basque
Uxoa Iñurrieta | Itziar Aduriz | Arantza Díaz de Ilarraza | Gorka Labaka | Kepa Sarasola
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Rule-Based Translation of Spanish Verb-Noun Combinations into Basque
Uxoa Iñurrieta | Itziar Aduriz | Arantza Díaz de Ilarraza | Gorka Labaka | Kepa Sarasola
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for Spanish-Basque. Linguistic information about a set of VNCs is gathered from the public database Konbitzul, and it is integrated into the MT system, leading to an improvement in BLEU, NIST and TER scores, as well as the results being evidently better according to human evaluators.

2016

pdf bib
Domain Adaptation in MT Using Titles in Wikipedia as a Parallel Corpus: Resources and Evaluation
Gorka Labaka | Iñaki Alegria | Kepa Sarasola
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents how an state-of-the-art SMT system is enriched by using an extra in-domain parallel corpora extracted from Wikipedia. We collect corpora from parallel titles and from parallel fragments in comparable articles from Wikipedia. We carried out an evaluation with a double objective: evaluating the quality of the extracted data and evaluating the improvement due to the domain-adaptation. We think this can be very useful for languages with limited amount of parallel corpora, where in-domain data is crucial to improve the performance of MT sytems. The experiments on the Spanish-English language pair improve a baseline trained with the Europarl corpus in more than 2 points of BLEU when translating in the Computer Science domain.

pdf bib
Using Linguistic Data for English and Spanish Verb-Noun Combination Identification
Uxoa Iñurrieta | Arantza Díaz de Ilarraza | Gorka Labaka | Kepa Sarasola | Itziar Aduriz | John Carroll
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification. Firstly, a sample of frequent VNCs are analysed in-depth and tagged along lexico-semantic and morphosyntactic dimensions, obtaining satisfactory inter-annotator agreement scores. Then, a VNC identification experiment is undertaken, where the analysed linguistic data is combined with chunking information and syntactic dependencies. A comparison between the results of the experiment and the results obtained by a basic detection method shows that VNC identification can be greatly improved by using linguistic information, as a large number of additional occurrences are detected with high precision.

2015

pdf bib
Exploiting portability to build an RBMT prototype for a new source language
Nora Aranberri | Gorka Labaka | Arantza Díaz de Ilarraza | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Building hybrid machine translation systems by using an EBMT preprocessor to create partialtranslations
Mikel Artetxe | Gorka Labaka | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Exploiting portability to build an RBMT prototype for a new source language
Nora Aranberri | Gorka Labaka | Arantza Díaz de Ilarraza | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
Building hybrid machine translation systems by using an EBMT preprocessor to create partial translations
Mikel Artetxe | Gorka Labaka | Kepa Sarasola
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2012

pdf bib
Contribution of Complex Lexical Information to Solve Syntactic Ambiguity in Basque
Aitziber Atutxa | Eneko Agirre | Kepa Sarasola
Proceedings of COLING 2012

2009

pdf bib
Use of Rich Linguistic Information to Translate Prepositions and Grammar Cases to Basque
Eneko Agirre | Aitziber Atutxa | Gorka Labaka | Mikel Lersundi | Aingeru Mayor | Kepa Sarasola
Proceedings of the 13th Annual conference of the European Association for Machine Translation

pdf bib
Relevance of Different Segmentation Options on Spanish-Basque SMT
Arantza Díaz de Ilarraza | Gorka Labaka | Kepa Sarasola
Proceedings of the 13th Annual conference of the European Association for Machine Translation

2008

pdf bib
Strategies for sustainable MT for Basque: incremental design, reusability, standardization and open-source
I. Alegria | X. Arregi | X. Artola | A. Diaz de Ilarraza | G. Labaka | M. Lersundi | A. Mayor | K. Sarasola
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

2005

pdf bib
An open-source shallow-transfer machine translation engine for the Romance languages of Spain
Antonio M. Corbi-Bellot | Mikel L. Forcada | Sergio Ortíz-Rojas | Juan Antonio Pérez-Ortiz | Gema Ramírez-Sánchez | Felipe Sánchez-Martínez | Iñaki Alegria | Aingeru Mayor | Kepa Sarasola
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

2004

pdf bib
Exploring Portability of Syntactic Information from English to Basque
Eneko Agirre | Aitziber Atutxa | Koldo Gojenola | Kepa Sarasola
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
Learning Argument/Adjunct Dictinction for Basque
Izaskun Aldezabal | Maxux Aranzabe | Koldo Gojenola | Kepa Sarasola | Aitziber Atutxa
Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition

pdf bib
Semiautomatic Labelling of Semantic Features
Arantza Díaz de Ilarraza | Aingeru Mayor | Kepa Sarasola
COLING 2002: The 19th International Conference on Computational Linguistics

2000

pdf bib
A Word-level Morphosyntactic Analyzer for Basque
I. Aduriz | E. Agirre | I. Aldezabal | X. Arregi | J. M. Arriola | X. Artola | K. Gojenola | A. Maritxalar | K. Sarasola | M. Urkia
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
A word-grammar based morphological analyzer for agglutinative languages
I. Aduriz | E. Agirre | I. Aldezabal | I. Alegria | X. Arregi | J. M. Arriola | X. Artola | K. Gojenola | A. Maritxalar | K. Sarasola | M. Urkia
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

pdf bib
A Bootstrapping Approach to Parser Development
Izaskun Aldezabal | Koldo Gojenola | Kepa Sarasola
Proceedings of the Sixth International Workshop on Parsing Technologies

This paper presents a robust parsing system for unrestricted Basque texts. It analyzes a sentence in two stages: a unification-based parser builds basic syntactic units such as NPs, PPs, and sentential complements, while a finite-state parser performs syntactic disambiguation and filtering of the results. The system has been applied to the acquisition of verbal subcategorization information, obtaining 66% recall and 87% precision in the determination of verb subcategorization instances. This information will be later incorporated to the parser, in order to improve its performance.

1998

pdf bib
Towards a single proposal in spelling correction
Eneko Agirre | Koldo Gojenola | Kepa Sarasola | Atro Voutilainen
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Towards a Single Proposal in Spelling Correction
Eneko Agirre | Koldo Gojenola | Kepa Sarasola | Atro Voutilainen
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

1994

pdf bib
Lexical, Knowledge Representation in an Intelligent Dictionary Help System
E. Agirre | X. Arregi | X. Artola | A. Diaz de Ilarraza | K. Sarasola
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

1993

pdf bib
A Morphological Analysis Based Method for Spelling Correction
I. Aduriz | E. Agirre | I. Alegria | X. Arregi | J.M Arriola | X. Artola | A. Diaz de Ilarraza | N. Ezeiza | M. Maritxalar | K. Sarasola | M. Urkia
Sixth Conference of the European Chapter of the Association for Computational Linguistics

1992

pdf bib
XUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology
E. Agirre | I Alegria | X Arregi | X Artola | A Diaz de Ilarraza | M Maritxalar | K Sarasola | M Urkia
Third Conference on Applied Natural Language Processing