Jerin Philip


2020

pdf bib
A Multilingual Parallel Corpora Collection Effort for Indian Languages
Shashank Siripragada | Jerin Philip | Vinay P. Namboodiri | C V Jawahar
Proceedings of the 12th Language Resources and Evaluation Conference

We present sentence aligned parallel corpora across 10 Indian Languages - Hindi, Telugu, Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English - many of which are categorized as low resource. The corpora are compiled from online sources which have content shared across languages. The corpora presented significantly extends present resources that are either not large enough or are restricted to a specific domain (such as health). We also provide a separate test corpus compiled from an independent online source that can be independently used for validating the performance in 10 Indian languages. Alongside, we report on the methods of constructing such corpora using tools enabled by recent advances in machine translation and cross-lingual retrieval using deep neural network based methods.

pdf bib
Monolingual Adapters for Zero-Shot Neural Machine Translation
Jerin Philip | Alexandre Berard | Matthias Gallé | Laurent Besacier
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a novel adapter layer formalism for adapting multilingual models. They are more parameter-efficient than existing adapter layers while obtaining as good or better performance. The layers are specific to one language (as opposed to bilingual adapters) allowing to compose them and generalize to unseen language-pairs. In this zero-shot setting, they obtain a median improvement of +2.77 BLEU points over a strong 20-language multilingual Transformer baseline trained on TED talks.

2019

pdf bib
CVIT’s submissions to WAT-2019
Jerin Philip | Shashank Siripragada | Upendra Kumar | Vinay Namboodiri | C V Jawahar
Proceedings of the 6th Workshop on Asian Translation

This paper describes the Neural Machine Translation systems used by IIIT Hyderabad (CVIT-MT) for the translation tasks part of WAT-2019. We participated in tasks pertaining to Indian languages and submitted results for English-Hindi, Hindi-English, English-Tamil and Tamil-English language pairs. We employ Transformer architecture experimenting with multilingual models and methods for low-resource languages.

2018

pdf bib
CVIT-MT Systems for WAT-2018
Jerin Philip | Vinay P. Namboodiri | C.V. Jawahar
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation