João Graça

Also published as: Joao Graca, João V. Graça


2020

pdf bib
Project MAIA: Multilingual AI Agent Assistant
André F. T. Martins | Joao Graca | Paulo Dimas | Helena Moniz | Graham Neubig
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper presents the Multilingual Artificial Intelligence Agent Assistant (MAIA), a project led by Unbabel with the collaboration of CMU, INESC-ID and IT Lisbon. MAIA will employ cutting-edge machine learning and natural language processing technologies to build multilingual AI agent assistants, eliminating language barriers. MAIA’s translation layer will empower human agents to provide customer support in real-time, in any language, with human quality.

2018

pdf bib
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing
Ramón Astudillo | João Graça | André Martins
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

pdf bib
Unbabel: How to combine AI with the crowd to scale professional-quality translation
João Graça
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

2012

pdf bib
A PropBank for Portuguese: the CINTIL-PropBank
António Branco | Catarina Carvalheiro | Sílvia Pereira | Sara Silveira | João Silva | Sérgio Castro | João Graça
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

With the CINTIL-International Corpus of Portuguese, an ongoing corpus annotated with fully flegded grammatical representation, sentences get not only a high level of lexical, morphological and syntactic annotation but also a semantic analysis that prepares the data to a manual specification step and thus opens the way for a number of tools and resources for which there is a great research focus at the present. This paper reports on the construction of a propbank that builds on CINTIL-DeepGramBank, with nearly 10 thousand sentences, on the basis of a deep linguistic grammar and on the process and the linguistic criteria guiding that construction, which makes possible to obtain a complete PropBank with both syntactic and semantic levels of linguistic annotation. Taking into account this and the promising scores presented in this study for inter-annotator agreement, CINTIL-PropBank presents itself as a great resource to train a semantic role labeller, one of our goals with this project.

pdf bib
Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Trevor Cohn | Phil Blunsom | Joao Graca
Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

pdf bib
The PASCAL Challenge on Grammar Induction
Douwe Gelling | Trevor Cohn | Phil Blunsom | João Graça
Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

pdf bib
Entropy-based Pruning for Phrase-based Machine Translation
Wang Ling | João Graça | Isabel Trancoso | Alan Black
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Wiki-ly Supervised Part-of-Speech Tagging
Shen Li | João Graça | Ben Taskar
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Universal Morphological Analysis using Structured Nearest Neighbor Prediction
Young-Bum Kim | João Graça | Benjamin Snyder
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Discriminative Phrase-based Lexicalized Reordering Models using Weighted Reordering Graphs
Wang Ling | João Graça | David Martins de Matos | Isabel Trancoso | Alan W Black
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Reordering Modeling using Weighted Alignment Matrices
Wang Ling | Tiago Luís | João Graça | Isabel Trancoso | Luísa Coheur
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Rich Prior Knowledge in Learning for Natural Language Processing
Gregory Druck | Kuzman Ganchev | João Graça
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2010

pdf bib
Learning Tractable Word Alignment Models with Complex Constraints
João V. Graça | Kuzman Ganchev | Ben Taskar
Computational Linguistics, Volume 36, Issue 3 - September 2010

pdf bib
Sparsity in Dependency Grammar Induction
Jennifer Gillenwater | Kuzman Ganchev | João Graça | Fernando Pereira | Ben Taskar
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Developing a Deep Linguistic Databank Supporting a Collection of Treebanks: the CINTIL DeepGramBank
António Branco | Francisco Costa | João Silva | Sara Silveira | Sérgio Castro | Mariana Avelãs | Clara Pinto | João Graça
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Corpora of sentences annotated with grammatical information have been deployed by extending the basic lexical and morphological data with increasingly complex information, such as phrase constituency, syntactic functions, semantic roles, etc. As these corpora grow in size and the linguistic information to be encoded reaches higher levels of sophistication, the utilization of annotation tools and, above all, supporting computational grammars appear no longer as a matter of convenience but of necessity. In this paper, we report on the design features, the development conditions and the methodological options of a deep linguistic databank, the CINTIL DeepGramBank. In this corpus, sentences are annotated with fully fledged linguistically informed grammatical representations that are produced by a deep linguistic processing grammar, thus consistently integrating morphological, syntactic and semantic information. We also report on how such corpus permits to straightforwardly obtain a whole range of past generation annotated corpora (POS, NER and morphology), current generation treebanks (constituency treebanks, dependency banks, propbanks) and next generation databanks (logical form banks) simply by means of a very residual selection/extraction effort to get the appropriate ""views"" exposing the relevant layers of information.

2008

pdf bib
Better Alignments = Better Translations?
Kuzman Ganchev | João V. Graça | Ben Taskar
Proceedings of ACL-08: HLT

pdf bib
Building a Golden Collection of Parallel Multi-Language Word Alignment
João Graça | Joana Paulo Pardal | Luísa Coheur | Diamantino Caseiro
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper reports an experience on producing manual word alignments over six different language pairs (all combinations between Portuguese, English, French and Spanish) (Graça et al., 2008). Word alignment of each language pair is made over the first 100 sentences of the common test set from the Europarl corpora (Koehn, 2005), corresponding to 600 new annotated sentences. This collection is publicly available at http://www.l2f.inesc- id.pt/resources/translation/. It contains, to our knowledge, the first word alignment gold set for the Portuguese language, with three other languages. Besides, it is to our knowledge, the first multi-language manual word aligned parallel corpus, where the same sentences are annotated for each language pair. We started by using the guidelines presented at (Mariño, 2005) and performed several refinements: some due to under-specifications on the original guidelines, others because of disagreement on some choices. This lead to the development of an extensive new set of guidelines for multi-lingual word alignment annotation that, we believe, makes the alignment process less ambiguous. We evaluate the inter-annotator agreement obtaining an average of 91.6% agreement between the different language pairs.

2007

pdf bib
Frustratingly Hard Domain Adaptation for Dependency Parsing
Mark Dredze | John Blitzer | Partha Pratim Talukdar | Kuzman Ganchev | João Graça | Fernando Pereira
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)