Deepak Gupta

Also published as: Deepak Kumar Gupta, Deepa Gupta


pdf bib
A Semi-supervised Approach to Generate the Code-Mixed Text using Pre-trained Encoder and Transfer Learning
Deepak Gupta | Asif Ekbal | Pushpak Bhattacharyya
Findings of the Association for Computational Linguistics: EMNLP 2020

Code-mixing, the interleaving of two or more languages within a sentence or discourse is ubiquitous in multilingual societies. The lack of code-mixed training data is one of the major concerns for the development of end-to-end neural network-based models to be deployed for a variety of natural language processing (NLP) applications. A potential solution is to either manually create or crowd-source the code-mixed labelled data for the task at hand, but that requires much human efforts and often not feasible because of the language specific diversity in the code-mixed text. To circumvent the data scarcity issue, we propose an effective deep learning approach for automatically generating the code-mixed text from English to multiple languages without any parallel data. In order to train the neural network, we create synthetic code-mixed texts from the available parallel corpus by modelling various linguistic properties of code-mixing. Our codemixed text generator is built upon the encoder-decoder framework, where the encoder is augmented with the linguistic and task-agnostic features obtained from the transformer based language model. We also transfer the knowledge from a neural machine translation (NMT) to warm-start the training of code-mixed generator. Experimental results and in-depth analysis show the effectiveness of our proposed code-mixed text generation on eight diverse language pairs.

pdf bib
Reinforced Multi-task Approach for Multi-hop Question Generation
Deepak Gupta | Hardik Chauhan | Ravi Tej Akella | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 28th International Conference on Computational Linguistics

Question generation (QG) attempts to solve the inverse of question answering (QA) problem by generating a natural language question given a document and an answer. While sequence to sequence neural models surpass rule-based systems for QG, they are limited in their capacity to focus on more than one supporting fact. For QG, we often require multiple supporting facts to generate high-quality questions. Inspired by recent works on multi-hop reasoning in QA, we take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context. We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator. In addition, we also proposed a question-aware reward function in a Reinforcement Learning (RL) framework to maximize the utilization of the supporting facts. We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA. Empirical evaluation shows our model to outperform the single-hop neural question generation models on both automatic evaluation metrics such as BLEU, METEOR, and ROUGE and human evaluation metrics for quality and coverage of the generated questions.


pdf bib
A Deep Neural Network based Approach for Entity Extraction in Code-Mixed Indian Social Media Text
Deepak Gupta | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
MMQA: A Multi-domain Multi-lingual Question-Answering Framework for English and Hindi
Deepak Gupta | Surabhi Kumari | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Uncovering Code-Mixed Challenges: A Framework for Linguistically Driven Question Generation and Neural Based Question Answering
Deepak Gupta | Pabitra Lenka | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 22nd Conference on Computational Natural Language Learning

Existing research on question answering (QA) and comprehension reading (RC) are mainly focused on the resource-rich language like English. In recent times, the rapid growth of multi-lingual web content has posed several challenges to the existing QA systems. Code-mixing is one such challenge that makes the task more complex. In this paper, we propose a linguistically motivated technique for code-mixed question generation (CMQG) and a neural network based architecture for code-mixed question answering (CMQA). For evaluation, we manually create the code-mixed questions for Hindi-English language pair. In order to show the effectiveness of our neural network based CMQA technique, we utilize two benchmark datasets, SQuAD and MMQA. Experiments show that our proposed model achieves encouraging performance on CMQG and CMQA.

pdf bib
Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy
Deepak Gupta | Rajkumar Pujari | Asif Ekbal | Pushpak Bhattacharyya | Anutosh Maitra | Tom Jain | Shubhashis Sengupta
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we propose a hybrid technique for semantic question matching. It uses a proposed two-layered taxonomy for English questions by augmenting state-of-the-art deep learning models with question classes obtained from a deep learning based question classifier. Experiments performed on three open-domain datasets demonstrate the effectiveness of our proposed approach. We achieve state-of-the-art results on partial ordering question ranking (POQR) benchmark dataset. Our empirical analysis shows that coupling standard distributional features (provided by the question encoder) with knowledge from taxonomy is more effective than either deep learning or taxonomy-based knowledge alone.


pdf bib
IITP at IJCNLP-2017 Task 4: Auto Analysis of Customer Feedback using CNN and GRU Network
Deepak Gupta | Pabitra Lenka | Harsimran Bedi | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the IJCNLP 2017, Shared Tasks

Analyzing customer feedback is the best way to channelize the data into new marketing strategies that benefit entrepreneurs as well as customers. Therefore an automated system which can analyze the customer behavior is in great demand. Users may write feedbacks in any language, and hence mining appropriate information often becomes intractable. Especially in a traditional feature-based supervised model, it is difficult to build a generic system as one has to understand the concerned language for finding the relevant features. In order to overcome this, we propose deep Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches that do not require handcrafting of features. We evaluate these techniques for analyzing customer feedback sentences on four languages, namely English, French, Japanese and Spanish. Our empirical analysis shows that our models perform well in all the four languages on the setups of IJCNLP Shared Task on Customer Feedback Analysis. Our model achieved the second rank in French, with an accuracy of 71.75% and third ranks for all the other languages.

pdf bib
IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification
Titas Nandi | Chris Biemann | Seid Muhie Yimam | Deepak Gupta | Sarah Kohail | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper we present the system for Answer Selection and Ranking in Community Question Answering, which we build as part of our participation in SemEval-2017 Task 3. We develop a Support Vector Machine (SVM) based system that makes use of textual, domain-specific, word-embedding and topic-modeling features. In addition, we propose a novel method for dialogue chain identification in comment threads. Our primary submission won subtask C, outperforming other systems in all the primary evaluation metrics. We performed well in other English subtasks, ranking third in subtask A and eighth in subtask B. We also developed open source toolkits for all the three English subtasks by the name cQARank [].


pdf bib
Opinion Mining in a Code-Mixed Environment: A Case Study with Government Portals
Deepak Gupta | Ankit Lamba | Asif Ekbal | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing


pdf bib
IITP: Supervised Machine Learning for Aspect based Sentiment Analysis
Deepak Kumar Gupta | Asif Ekbal
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)


pdf bib
Morpho-syntactic Information for Automatic Error Analysis of Statistical Machine Translation Output
Maja Popović | Adrià de Gispert | Deepa Gupta | Patrik Lambert | Hermann Ney | José B. Mariño | Marcello Federico | Rafael Banchs
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Exploiting Word Transformation in Statistical Machine Translation from Spanish to English
Deepa Gupta | Marcello Federico
Proceedings of the 11th Annual conference of the European Association for Machine Translation