Dhirendra Singh


2016

pdf bib
Multiword Expressions Dataset for Indian Languages
Dhirendra Singh | Sudha Bhingardive | Pushpak Bhattacharyya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Multiword Expressions (MWEs) are used frequently in natural languages, but understanding the diversity in MWEs is one of the open problem in the area of Natural Language Processing. In the context of Indian languages, MWEs play an important role. In this paper, we present MWEs annotation dataset created for Indian languages viz., Hindi and Marathi. We extract possible MWE candidates using two repositories: 1) the POS-tagged corpus and 2) the IndoWordNet synsets. Annotation is done for two types of MWEs: compound nouns and light verb constructions. In the process of annotation, human annotators tag valid MWEs from these candidates based on the standard guidelines provided to them. We obtained 3178 compound nouns and 2556 light verb constructions in Hindi and 1003 compound nouns and 2416 light verb constructions in Marathi using two repositories mentioned before. This created resource is made available publicly and can be used as a gold standard for Hindi and Marathi MWE systems.

pdf bib
Synset Ranking of Hindi WordNet
Sudha Bhingardive | Rajita Shukla | Jaya Saraswati | Laxmi Kashyap | Dhirendra Singh | Pushpak Bhattacharyya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Word Sense Disambiguation (WSD) is one of the open problems in the area of natural language processing. Various supervised, unsupervised and knowledge based approaches have been proposed for automatically determining the sense of a word in a particular context. It has been observed that such approaches often find it difficult to beat the WordNet First Sense (WFS) baseline which assigns the sense irrespective of context. In this paper, we present our work on creating the WFS baseline for Hindi language by manually ranking the synsets of Hindi WordNet. A ranking tool is developed where human experts can see the frequency of the word senses in the sense-tagged corpora and have been asked to rank the senses of a word by using this information and also his/her intuition. The accuracy of WFS baseline is tested on several standard datasets. F-score is found to be 60%, 65% and 55% on Health, Tourism and News datasets respectively. The created rankings can also be used in other NLP applications viz., Machine Translation, Information Retrieval, Text Summarization, etc.

2015

pdf bib
Using Word Embeddings for Bilingual Unsupervised WSD
Sudha Bhingardive | Dhirendra Singh | Rudramurthy V | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

pdf bib
Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features
Dhirendra Singh | Sudha Bhingardive | Kevin Patel | Pushpak Bhattacharyya
Proceedings of the 12th International Conference on Natural Language Processing

pdf bib
Unsupervised Most Frequent Sense Detection using Word Embeddings
Sudha Bhingardive | Dhirendra Singh | Rudramurthy V | Hanumant Redkar | Pushpak Bhattacharyya
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Merging Verb Senses of Hindi WordNet using Word Embeddings
Sudha Bhingardive | Ratish Puduppully | Dhirendra Singh | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing