Malhar Kulkarni

Also published as: Malhar A. Kulkarni


2020

pdf bib
Challenge Dataset of Cognates and False Friend Pairs from Indian Languages
Diptesh Kanojia | Malhar Kulkarni | Pushpak Bhattacharyya | Gholamreza Haffari
Proceedings of the 12th Language Resources and Evaluation Conference

Cognates are present in multiple variants of the same text across different languages (e.g., “hund” in German and “hound” in the English language mean “dog”). They pose a challenge to various Natural Language Processing (NLP) applications such as Machine Translation, Cross-lingual Sense Disambiguation, Computational Phylogenetics, and Information Retrieval. A possible solution to address this challenge is to identify cognates across language pairs. In this paper, we describe the creation of two cognate datasets for twelve Indian languages namely Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. We digitize the cognate data from an Indian language cognate dictionary and utilize linked Indian language Wordnets to generate cognate sets. Additionally, we use the Wordnet data to create a False Friends’ dataset for eleven language pairs. We also evaluate the efficacy of our dataset using previously available baseline cognate detection approaches. We also perform a manual evaluation with the help of lexicographers and release the curated gold-standard dataset with this paper.

pdf bib
Part-of-Speech Annotation Challenges in Marathi
Gajanan Rane | Nilesh Joshi | Geetanjali Rane | Hanumant Redkar | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation

Part of Speech (POS) annotation is a significant challenge in natural language processing. The paper discusses issues and challenges faced in the process of POS annotation of the Marathi data from four domains viz., tourism, health, entertainment and agriculture. During POS annotation, a lot of issues were encountered. Some of the major ones are discussed in detail in this paper. Also, the two approaches viz., the lexical (L approach) and the functional (F approach) of POS tagging have been discussed and presented with examples. Further, some ambiguous cases in POS annotation are presented in the paper.

pdf bib
Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages
Diptesh Kanojia | Raj Dabre | Shubham Dewangan | Pushpak Bhattacharyya | Gholamreza Haffari | Malhar Kulkarni
Proceedings of the 28th International Conference on Computational Linguistics

Cognates are variants of the same lexical form across different languages; for example “fonema” in Spanish and “phoneme” in English are cognates, both of which mean “a unit of sound”. The task of automatic detection of cognates among any two languages can help downstream NLP tasks such as Cross-lingual Information Retrieval, Computational Phylogenetics, and Machine Translation. In this paper, we demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian Languages. Our approach introduces the use of context from a knowledge graph to generate improved feature representations for cognate detection. We, then, evaluate the impact of our cognate detection mechanism on neural machine translation (NMT), as a downstream task. We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages, namely, Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. Additionally, we create evaluation datasets for two more Indian languages, Konkani and Nepali. We observe an improvement of up to 18% points, in terms of F-score, for cognate detection. Furthermore, we observe that cognates extracted using our method help improve NMT quality by up to 2.76 BLEU. We also release our code, newly constructed datasets and cross-lingual models publicly.

2019

pdf bib
Introduction to Sanskrit Shabdamitra: An Educational Application of Sanskrit Wordnet
Malhar Kulkarni | Nilesh Joshi | Sayali Khare | Hanumant Redkar | Pushpak Bhattacharyya
Proceedings of the 6th International Sanskrit Computational Linguistics Symposium

pdf bib
Utilizing Word Embeddings based Features for Phylogenetic Tree Generation of Sanskrit Texts
Diptesh Kanojia | Abhijeet Dubey | Malhar Kulkarni | Pushpak Bhattacharyya | Gholemreza Haffari
Proceedings of the 6th International Sanskrit Computational Linguistics Symposium

pdf bib
An Introduction to the Textual History Tool
Diptesh Kanojia | Malhar Kulkarni | Pushpak Bhattacharyya | Eivind Kahrs
Proceedings of the 6th International Sanskrit Computational Linguistics Symposium

2017

pdf bib
Hindi Shabdamitra: A Wordnet based E-Learning Tool for Language Learning and Teaching
Hanumant Redkar | Sandhya Singh | Meenakshi Somasundaram | Dhara Gorasia | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA 2017)

In today’s technology driven digital era, education domain is undergoing a transformation from traditional approaches to more learner controlled and flexible methods of learning. This transformation has opened the new avenues for interdisciplinary research in the field of educational technology and natural language processing in developing quality digital aids for learning and teaching. The tool presented here - Hindi Shabhadamitra, developed using Hindi Wordnet for Hindi language learning, is one such e-learning tool. It has been developed as a teaching and learning aid suitable for formal school based curriculum and informal setup for self learning users. Besides vocabulary, it also provides word based grammar along with images and pronunciation for better learning and retention. This aid demonstrates that how a rich lexical resource like wordnet can be systematically remodeled for practical usage in the educational domain.

pdf bib
Hindi Shabdamitra: A Wordnet based E-Learning Tool for Language Learning and Teaching
Hanumant Redkar | Sandhya Singh | Dhara Gorasia | Meenakshi Somasundaram | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017)

2016

pdf bib
Use of Features for Accentuation of ghañanta Words
Samir Janardan Sohoni | Malhar A. Kulkarni
Proceedings of the 13th International Conference on Natural Language Processing

pdf bib
Verbframator:Semi-Automatic Verb Frame Annotator Tool with Special Reference to Marathi
Hanumant Redkar | Sandhya Singh | Nandini Ghag | Jai Paranjape | Nilesh Joshi | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 13th International Conference on Natural Language Processing

2014

pdf bib
Semi-Automatic Extension of Sanskrit Wordnet using Bilingual Dictionary
Sudha Bhingardive | Tanuja Ajotikar | Irawati Kulkarni | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the Seventh Global Wordnet Conference

pdf bib
Introduction to Synskarta: An Online Interface for Synset Creation with Special Reference to Sanskrit
Hanumant Redkar | Jai Paranjape | Nilesh Joshi | Irawati Kulkarni | Malhar Kulkarni | Pushpak Bhattacharyya
Proceedings of the 11th International Conference on Natural Language Processing

2012

pdf bib
Semantic Processing of Compounds in Indian Languages
Amba Kulkarni | Soma Paul | Malhar Kulkarni | Anil Kumar | Nitesh Surtani
Proceedings of COLING 2012