Anil Kumar Singh

Also published as: Anil Kumar Singh


pdf bib
NLPRL at WNUT-2020 Task 2: ELMo-based System for Identification of COVID-19 Tweets
Rajesh Kumar Mundotiya | Rupjyoti Baruah | Bhavana Srivastava | Anil Kumar Singh
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

The Coronavirus pandemic has been a dominating news on social media for the last many months. Efforts are being made to reduce its spread and reduce the casualties as well as new infections. For this purpose, the information about the infected people and their related symptoms, as available on social media, such as Twitter, can help in prevention and taking precautions. This is an example of using noisy text processing for disaster management. This paper discusses the NLPRL results in Shared Task-2 of WNUT-2020 workshop. We have considered this problem as a binary classification problem and have used a pre-trained ELMo embedding with GRU units. This approach helps classify the tweets with accuracy as 80.85% and 78.54% as F1-score on the provided test dataset. The experimental code is available online.


pdf bib
NLPRL at WAT2019: Transformer-based Tamil – English Indic Task Neural Machine Translation System
Amit Kumar | Anil Kumar Singh
Proceedings of the 6th Workshop on Asian Translation

This paper describes the Machine Translation system for Tamil-English Indic Task organized at WAT 2019. We use Transformer- based architecture for Neural Machine Translation.


pdf bib
NLPRL-IITBHU at SemEval-2018 Task 3: Combining Linguistic Features and Emoji pre-trained CNN for Irony Detection in Tweets
Harsh Rangwani | Devang Kulshreshtha | Anil Kumar Singh
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes our participation in SemEval 2018 Task 3 on Irony Detection in Tweets. We combine linguistic features with pre-trained activations of a neural network. The CNN is trained on the emoji prediction task. We combine the two feature sets and feed them into an XGBoost Classifier for classification. Subtask-A involves classification of tweets into ironic and non-ironic instances whereas Subtask-B involves classification of the tweet into - non-ironic, verbal irony, situational irony or other verbal irony. It is observed that combining features from these two different feature spaces improves our system results. We leverage the SMOTE algorithm to handle the problem of class imbalance in Subtask-B. Our final model achieves an F1-score of 0.65 and 0.47 on Subtask-A and Subtask-B respectively. Our system ranks 4th on both tasks respectively, outperforming the baseline by 6% on Subtask-A and 14% on Subtask-B.

pdf bib
Experiments on Morphological Reinflection: CoNLL-2018 Shared Task
Rishabh Jain | Anil Kumar Singh
Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

pdf bib
How emotional are you? Neural Architectures for Emotion Intensity Prediction in Microblogs
Devang Kulshreshtha | Pranav Goel | Anil Kumar Singh
Proceedings of the 27th International Conference on Computational Linguistics

Social media based micro-blogging sites like Twitter have become a common source of real-time information (impacting organizations and their strategies, and are used for expressing emotions and opinions. Automated analysis of such content therefore rises in importance. To this end, we explore the viability of using deep neural networks on the specific task of emotion intensity prediction in tweets. We propose a neural architecture combining convolutional and fully connected layers in a non-sequential manner - done for the first time in context of natural language based tasks. Combined with lexicon-based features along with transfer learning, our model achieves state-of-the-art performance, outperforming the previous system by 0.044 or 4.4% Pearson correlation on the WASSA’17 EmoInt shared task dataset. We investigate the performance of deep multi-task learning models trained for all emotions at once in a unified architecture and get encouraging results. Experiments performed on evaluating correlation between emotion pairs offer interesting insights into the relationship between them.

pdf bib
Di-LSTM Contrast : A Deep Neural Network for Metaphor Detection
Krishnkant Swarnkar | Anil Kumar Singh
Proceedings of the Workshop on Figurative Language Processing

The contrast between the contextual and general meaning of a word serves as an important clue for detecting its metaphoricity. In this paper, we present a deep neural architecture for metaphor detection which exploits this contrast. Additionally, we also use cost-sensitive learning by re-weighting examples, and baseline features like concreteness ratings, POS and WordNet-based features. The best performing system of ours achieves an overall F1 score of 0.570 on All POS category and 0.605 on the Verbs category at the Metaphor Shared Task 2018.

pdf bib
IIT (BHU) Submission for the ACL Shared Task on Named Entity Recognition on Code-switched Data
Shashwat Trivedi | Harsh Rangwani | Anil Kumar Singh
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

This paper describes the best performing system for the shared task on Named Entity Recognition (NER) on code-switched data for the language pair Spanish-English (ENG-SPA). We introduce a gated neural architecture for the NER task. Our final model achieves an F1 score of 63.76%, outperforming the baseline by 10%.

pdf bib
IIT (BHU) System for Indo-Aryan Language Identification (ILI) at VarDial 2018
Divyanshu Gupta | Gourav Dhakad | Jayprakash Gupta | Anil Kumar Singh
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

Text language Identification is a Natural Language Processing task of identifying and recognizing a given language out of many different languages from a piece of text. This paper describes our submission to the ILI 2018 shared-task, which includes the identification of 5 closely related Indo-Aryan languages. We developed a word-level LSTM(Long Short-term Memory) model, a specific type of Recurrent Neural Network model, for this task. Given a sentence, our model embeds each word of the sentence and convert into its trainable word embedding, feeds them into our LSTM network and finally predict the language. We obtained an F1 macro score of 0.836, ranking 5th in the task.

pdf bib
Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture
Soumil Mandal | Anil Kumar Singh
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

An accurate language identification tool is an absolute necessity for building complex NLP systems to be used on code-mixed data. Lot of work has been recently done on the same, but there’s still room for improvement. Inspired from the recent advancements in neural network architectures for computer vision tasks, we have implemented multichannel neural networks combining CNN and LSTM for word level language identification of code-mixed data. Combining this with a Bi-LSTM-CRF context capture module, accuracies of 93.28% and 93.32% is achieved on our two testing sets.


pdf bib
Experiments on Morphological Reinflection: CoNLL-2017 Shared Task
Akhilesh Sudhakar | Anil Kumar Singh
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection

pdf bib
IJCNLP-2017 Task 3: Review Opinion Diversification (RevOpiD-2017)
Anil Kumar Singh | Avijit Thawani | Mayank Panchal | Anubhav Gupta | Julian McAuley
Proceedings of the IJCNLP 2017, Shared Tasks

Unlike Entity Disambiguation in web search results, Opinion Disambiguation is a relatively unexplored topic. RevOpiD shared task at IJCNLP-2107 aimed to attract attention towards this research problem. In this paper, we summarize the first run of this task and introduce a new dataset that we have annotated for the purpose of evaluating Opinion Mining, Summarization and Disambiguation methods.

pdf bib
IIT (BHU): System Description for LSDSem’17 Shared Task
Pranav Goel | Anil Kumar Singh
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

This paper describes an ensemble system submitted as part of the LSDSem Shared Task 2017 - the Story Cloze Test. The main conclusion from our results is that an approach based on semantic similarity alone may not be enough for this task. We test various approaches and compare them with two ensemble systems. One is based on voting and the other on logistic regression based classifier. Our final system is able to outperform the previous state of the art for the Story Cloze test. Another very interesting observation is the performance of sentiment based approach which works almost as well on its own as our final ensemble system.

pdf bib
Word Transduction for Addressing the OOV Problem in Machine Translation for Similar Resource-Scarce Languages
Shashikant Sharma | Anil Kumar Singh
Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP 2017)

pdf bib
Reference Scope Identification for Citances Using Convolutional Neural Networks
Saurav Jha | Aanchal Chaurasia | Akhilesh Sudhakar | Anil Kumar Singh
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

pdf bib
Neural Morphological Disambiguation Using Surface and Contextual Morphological Awareness
Akhilesh Sudhakar | Anil Kumar Singh
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)


pdf bib
IIT (BHU) Submission on the CoNLL-2016 Shared Task: Shallow Discourse Parsing using Semantic Lexicons
Manpreet Kaur | Nishu Kumari | Anil Kumar Singh | Rajeev Sangal
Proceedings of the CoNLL-16 shared task

pdf bib
Proceedings of the 13th International Conference on Natural Language Processing
Dipti Misra Sharma | Rajeev Sangal | Anil Kumar Singh
Proceedings of the 13th International Conference on Natural Language Processing


pdf bib
Shallow Discourse Parsing with Syntactic and (a Few) Semantic Features
Shubham Mukherjee | Abhishek Tiwari | Mohit Gupta | Anil Kumar Singh
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task


pdf bib
SSF: A Common Representation Scheme for Language Analysis for Language Technology Infrastructure Development
Akshar Bharati | Rajeev Sangal | Dipti Misra Sharma | Anil Kumar Singh
Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT


pdf bib
LIMSI Submission for the WMT‘13 Quality Estimation Task: an Experiment with N-Gram Posteriors
Anil Kumar Singh | Guillaume Wisniewski | François Yvon
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
A corpus of post-edited translations (Un corpus d’erreurs de traduction) [in French]
Guillaume Wisniewski | Anil Kumar Singh | Natalia Segal | François Yvon
Proceedings of TALN 2013 (Volume 2: Short Papers)


pdf bib
A GUI to Detect and Correct Errors in Hindi Dependency Treebank
Rahul Agarwal | Bharat Ram Ambati | Anil Kumar Singh
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

A treebank is an important resource for developing many NLP based tools. Errors in the treebank may lead to error in the tools that use it. It is essential to ensure the quality of a treebank before it can be deployed for other purposes. Automatic (or semi-automatic) detection of errors in the treebank can reduce the manual work required to find and remove errors. Usually, the errors found automatically are manually corrected by the annotators. There is not much work reported so far on error correction tools which helps the annotators in correcting errors efficiently. In this paper, we present such an error correction tool that is an extension of the error detection method described earlier (Ambati et al., 2010; Ambati et al., 2011; Agarwal et al., 2012).

pdf bib
A Concise Query Language with Search and Transform Operations for Corpora with Multiple Levels of Annotation
Anil Kumar Singh
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The usefulness of annotated corpora is greatly increased if there is an associated tool that can allow various kinds of operations to be performed in a simple way. Different kinds of annotation frameworks and many query languages for them have been proposed, including some to deal with multiple layers of annotation. We present here an easy to learn query language for a particular kind of annotation framework based on ‘threaded trees', which are somewhere between the complete order of a tree and the anarchy of a graph. Through 'typed' threads, they can allow multiple levels of annotation in the same document. Our language has a simple, intuitive and concise syntax and high expressive power. It allows not only to search for complicated patterns with short queries but also allows data manipulation and specification of arbitrary return values. Many of the commonly used tasks that otherwise require writing programs, can be performed with one or more queries. We compare the language with some others and try to evaluate it.


pdf bib
An Integrated Digital Tool for Accessing Language Resources
Anil Kumar Singh | Bharat Ram Ambati
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Language resources can be classified under several categories. To be able to query and operate on all (or most of) these categories using a single digital tool would be very helpful for a large number of researchers working on languages. We describe such a tool in this paper. It is different from other such tools in that it allows querying and transformation on different kinds of resources (such as corpora, lexicon and language models) with the same framework. Search options can be given based on the kind of resource being queried. It is possible to select a matched resource and open it for editing in the specialized interfaces with which that resource is associated. The tool also allows the extracted or modified data to be saved separately, apart from having the usual facilities like displaying the results in KeyWord-In-Context (KWIC) format. We also present the notation used for querying and transformation, which is comparable to but different from the Corpus Query Language (CQL).

pdf bib
Grammar Extraction from Treebanks for Hindi and Telugu
Prasanth Kolachina | Sudheer Kolachina | Anil Kumar Singh | Samar Husain | Viswanath Naidu | Rajeev Sangal | Akshar Bharati
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Grammars play an important role in many Natural Language Processing (NLP) applications. The traditional approach to creating grammars manually, besides being labor-intensive, has several limitations. With the availability of large scale syntactically annotated treebanks, it is now possible to automatically extract an approximate grammar of a language in any of the existing formalisms from a corresponding treebank. In this paper, we present a basic approach to extract grammars from dependency treebanks of two Indian languages, Hindi and Telugu. The process of grammar extraction requires a generalization mechanism. Towards this end, we explore an approach which relies on generalization of argument structure over the verbs based on their syntactic similarity. Such a generalization counters the effect of data sparseness in the treebanks. A grammar extracted using this system can not only expand already existing knowledge bases for NLP tasks such as parsing, but also aid in the creation of grammars for languages where none exist. Further, we show that the grammar extraction process can help in identifying annotation errors and thus aid in the task of the treebank validation.


pdf bib
From Bag of Languages to Family Trees From Noisy Corpus
Taraka Rama | Anil Kumar Singh
Proceedings of the International Conference RANLP-2009

pdf bib
Modeling Letter-to-Phoneme Conversion as a Phrase Based Statistical Machine Translation Problem with Minimum Error Rate Training
Taraka Rama | Anil Kumar Singh | Sudheer Kolachina
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium


pdf bib
Estimating the Resource Adaption Cost from a Resource Rich Language to a Similar Resource Poor Language
Anil Kumar Singh | Kiran Pala | Harshit Surana
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Developing resources which can be used for Natural Language Processing is an extremely difficult task for any language, but is even more so for less privileged (or less computerized) languages. One way to overcome this difficulty is to adapt the resources of a linguistically close resource rich language. In this paper we discuss how the cost of such adaption can be estimated using subjective and objective measures of linguistic similarity for allocating financial resources, time, manpower etc. Since this is the first work of its kind, the method described in this paper should be seen as only a preliminary method, indicative of how better methods can be developed. Corpora of several less computerized languages had to be collected for the work described in the paper, which was difficult because for many of these varieties there is not much electronic data available. Even if it is, it is in non-standard encodings, which means that we had to build encoding converters for these varieties. The varieties we have focused on are some of the varieties spoken in the South Asian region.

pdf bib
A More Discerning and Adaptable Multilingual Transliteration Mechanism for Indian Languages
Harshit Surana | Anil Kumar Singh
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
A Mechanism to Provide Language-Encoding Support and an NLP Friendly Editor
Anil Kumar Singh
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Natural Language Processing for Less Privileged Languages: Where do we come from? Where are we going?
Anil Kumar Singh
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

pdf bib
Named Entity Recognition for South and South East Asian Languages: Taking Stock
Anil Kumar Singh
Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages


pdf bib
Can Corpus Based Measures be Used for Comparative Study of Languages?
Anil Kumar Singh | Harshit Surana
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology


pdf bib
Study of Some Distance Measures for Language and Encoding Identification
Anil Kumar Singh
Proceedings of the Workshop on Linguistic Distances


pdf bib
Comparison, Selection and Use of Sentence Alignment Algorithms for New Language Pairs
Anil Kumar Singh | Samar Husain
Proceedings of the ACL Workshop on Building and Using Parallel Texts