Kashif Shah


2018

pdf bib
Neural Network based Extreme Classification and Similarity Models for Product Matching
Kashif Shah | Selcuk Kopru | Jean-David Ruvini
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

Matching a seller listed item to an appropriate product has become a fundamental and one of the most significant step for e-commerce platforms for product based experience. It has a huge impact on making the search effective, search engine optimization, providing product reviews and product price estimation etc. along with many other advantages for a better user experience. As significant and vital it has become, the challenge to tackle the complexity has become huge with the exponential growth of individual and business sellers trading millions of products everyday. We explored two approaches; classification based on shallow neural network and similarity based on deep siamese network. These models outperform the baseline by more than 5% in term of accuracy and are capable of extremely efficient training and inference.

2016

pdf bib
Creation of comparable corpora for English-Urdu, Arabic, Persian
Murad Abouammoh | Kashif Shah | Ahmet Aker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Statistical Machine Translation (SMT) relies on the availability of rich parallel corpora. However, in the case of under-resourced languages or some specific domains, parallel corpora are not readily available. This leads to under-performing machine translation systems in those sparse data settings. To overcome the low availability of parallel resources the machine translation community has recognized the potential of using comparable resources as training data. However, most efforts have been related to European languages and less in middle-east languages. In this study, we report comparable corpora created from news articles for the pair English ―{Arabic, Persian, Urdu} languages. The data has been collected over a period of a year, entails Arabic, Persian and Urdu languages. Furthermore using the English as a pivot language, comparable corpora that involve more than one language can be created, e.g. English- Arabic - Persian, English - Arabic - Urdu, English ― Urdu - Persian, etc. Upon request the data can be provided for research purposes.

pdf bib
Large-scale Multitask Learning for Machine Translation Quality Estimation
Kashif Shah | Lucia Specia
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
SHEF-Multimodal: Grounding Machine Translation on Images
Kashif Shah | Josiah Wang | Lucia Specia
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Word embeddings and discourse information for Quality Estimation
Carolina Scarton | Daniel Beck | Kashif Shah | Karin Sim Smith | Lucia Specia
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
SHEF-LIUM-NN: Sentence level Quality Estimation with Neural Network Features
Kashif Shah | Fethi Bougares | Loïc Barrault | Lucia Specia
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
USFD at SemEval-2016 Task 1: Putting different State-of-the-Arts into a Box
Ahmet Aker | Frederic Blain | Andres Duque | Marina Fomicheva | Jurica Seva | Kashif Shah | Daniel Beck
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Investigating Continuous Space Language Models for Machine Translation Quality Estimation
Kashif Shah | Raymond W. M. Ng | Fethi Bougares | Lucia Specia
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
SHEF-NN: Translation Quality Estimation with Neural Networks
Kashif Shah | Varvara Logacheva | Gustavo Paetzold | Frederic Blain | Daniel Beck | Fethi Bougares | Lucia Specia
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
SHEF-Lite 2.0: Sparse Multi-task Gaussian Processes for Translation Quality Estimation
Daniel Beck | Kashif Shah | Lucia Specia
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
An efficient and user-friendly tool for machine translation quality estimation
Kashif Shah | Marco Turchi | Lucia Specia
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a new version of QUEST ― an open source framework for machine translation quality estimation ― which brings a number of improvements: (i) it provides a Web interface and functionalities such that non-expert users, e.g. translators or lay-users of machine translations, can get quality predictions (or internal features of the framework) for translations without having to install the toolkit, obtain resources or build prediction models; (ii) it significantly improves over the previous runtime performance by keeping resources (such as language models) in memory; (iii) it provides an option for users to submit the source text only and automatically obtain translations from Bing Translator; (iv) it provides a ranking of multiple translations submitted by users for each source text according to their estimated quality. We exemplify the use of this new version through some experiments with the framework.

2013

pdf bib
SHEF-Lite: When Less is More for Translation Quality Estimation
Daniel Beck | Kashif Shah | Trevor Cohn | Lucia Specia
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
QuEst - A translation quality estimation framework
Lucia Specia | Kashif Shah | Jose G.C. de Souza | Trevor Cohn
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2011

pdf bib
LIUM’s SMT Machine Translation Systems for WMT 2011
Holger Schwenk | Patrik Lambert | Loïc Barrault | Christophe Servan | Sadaf Abdul-Rauf | Haithem Afli | Kashif Shah
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Parametric Weighting of Parallel Data for Statistical Machine Translation
Kashif Shah | Loïc Barrault | Holger Schwenk
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Translation Model Adaptation by Resampling
Kashif Shah | Loïc Barrault | Holger Schwenk
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR